subject:"\[jira\] \[Commented\] \(HBASE\-6295\) Possible performance improvement in client batch operations\: presplit and send in background"

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-08 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702625#comment-13702625
 ] 

stack commented on HBASE-6295:
--

So, lets move the above discussion over to hbase-8810.  hbase- hopefully 
restores timings so hbase-it can pass again.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-03 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699393#comment-13699393
 ] 

Nicolas Liochon commented on HBASE-6295:


In HBASE-8810, I proposed to have:
a file called hbase-settings-sample.xml that would not be included when we 
read the conf (while we read hbase-default and hbase-site today). It would be 
for documentation only.
a unit test to load this file and compare with the code default, to ensure 
our doc is in line with the code.

The script would work as well. I think the test should use the defaults. Then, 
inside the tests, we can change them, for example when we know it's gonna fail 
and we don't want to try 20 times.


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699375#comment-13699375
 ] 

stack commented on HBASE-6295:
--

On:

{code}
the defaults in the code
hbase-defaults.xml in hbase-common (seems to be used when do the integration 
test with a cluster)
hbase-site.xml in hbase-server/test (seems to be used when you run the 
integration test with a minicluster)
hbase-site.xml in hbase-client
hbase-site.xml in conf
{code}

Removing hadoop-default.xml is a radical notion.  hbase-default.xml used to be 
in conf for all to view and adapt into an hbase-site.xml.  hbase-3090 moved it 
out of conf and into jar so that new installs picked up new defaults.  This 
made hbase-default.xml content effectively opaque unless you undid the jar or 
went to the refguide to read the doc. we generate from it (See 
http://hbase.apache.org/book.html#hbase.site)  My guess is no one looks at the 
refguide.  This would seem to rendor hbase-default.xml near useless?   Yet we 
have to maintain it.  In the configuration code, we'll favor the hbase-default* 
setting over what we have in code.

If we remove it, then we'll only use what is in code.  Means we won't have list 
of configs. in doc. w/ their descriptions.

We could generate a class from the hbase-default.xml src that wrote out a 
Constants java file which had in it defines that we'd use as default whenever 
we did Configuration#getInt.  If you added something to hbase-default.xml, 
you'd have to use a constant.  Would mean a script run against the src that 
would fail if it found something in hbase-default.xml that had a default in 
code that was not an upper-case constant?

The hbase-site.xml in conf is empty always.  Probably better named 
hbase-site.xml.template.

The other hbase-site.xmls are configs for the local tests.  Notion is that 
tests have shorter timeouts and retries than what we ship as our defaults.  Do 
we want to reexamine this and have the hbase defaults true for tests too?

Thanks Elliott and Nicolas for figuring this one out.



> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-03 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698771#comment-13698771
 ] 

Elliott Clark commented on HBASE-6295:
--

A, I get it now.  I think #2 probably had the most impact here.

On the subject of settings: I opened up HBASE-8810 to bring the constants in 
line with what's in our hbase-defaults.xml.

bq.I personally think that we should get rid of the hbase-*.xml in our package 
to be sure we're using the code defaults. Today, we have:
I would disagree.  The xml is really useful for users to find important 
settings and their defaults without having to look through the entire code 
base.  I do agree that we should combine all of the test hbase-site.xml's into 
one so that there's less confusion on which is being used.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-03 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698756#comment-13698756
 ] 

Nicolas Liochon commented on HBASE-6295:


1) Settings
I don't think that the settings are good today in the trunk.
In hbase-common/resources; we still have:
- hbase.client.pause = 100 (It's 1000 in the code)
- hbase.client.retries.number = 14 (it's 20 in the code)
As a consequence, we retry for ~30s before failing.

We see this in Elliott's log above.
client.AsyncProcess: Attempt 10/14 failed for 1 operations [...]- sleeping 3201 
ms.
=> the max retry is 14, not 20. 
=> after 10 failure, we still sleep for only 3.2 seconds. 

I personally think that we should get rid of the hbase-*.xml in our package to 
be sure we're using the code defaults. Today, we have:
- the defaults in the code
- hbase-defaults.xml in hbase-common (seems to be used when do the integration 
test with a cluster)
- hbase-site.xml in hbase-server/test (seems to be used when you run the 
integration test with a minicluster)
- hbase-site.xml in hbase-client
- hbase-site.xml in conf
2) Previously, when we were sending a single multi call with 100 puts in it to 
the server and it was failing we were counting 100 errors. We now count 1 
error. The previous behavior was a bug, but it's consequence is that the 
backoff time was always the max, hiding the impact of the first point 
(basically we were sleeping 14 times 6 seconds, while now, during the first 
attempts we sleep 100 ms)

3) For completeness, this patch degrades the MTTR, especially in tests. Two 
reasons for this. First, the clients send more writes to the server: that's why 
we have better performances with YCSB. Second, during a failure, the clients 
can still send writes to the other regions instead of waiting for the result of 
a write on the recovering regions. This means that there are less resources 
available to do the recovery, as the clients are not stuck anymore and are 
sending more writes. This is especially visible in tests, because we try to 
write as much as possible (while in real life the writes are more immutable as 
they are coming from an external source).



I'm not saying that there is not bug in the code I committed. It's complex and 
it's not production proven, so I'm actually quite sure there are bugs :-). What 
I'm saying is
- on my tests, when the integration test was failing is was because we reached 
the maximum number of retries.
- finding out what are the actual settings used is pure hell
- and I was expecting that HBASE-8776 would do this.

I'm gonna hijack HBASE-8776 to propose some settings.



> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-02 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698633#comment-13698633
 ] 

Elliott Clark commented on HBASE-6295:
--

Correct. 

Here's the timing as I know it:

# The Tests were passing 90% of the time.
# Then the defaults got re-done
# The tests started failing a lot (they failed 80% of the time).
# Then I put in something to extend the timeouts ( HBASE-8723 ).
# Then the were passing >90% of the time.
# Then this went in.
# Tests have failed consistently since this patch went in (100% of the time on 
any tests that have chaos monkey).
# Then [~sershe] moved the defaults for timeouts back down.
# The tests continued failing.



> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-02 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698621#comment-13698621
 ] 

stack commented on HBASE-6295:
--

[~eclark] IT tests were passing before even after defaults had been rejiggered 
and before this background thread addition?

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-02 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698616#comment-13698616
 ] 

Elliott Clark commented on HBASE-6295:
--

The only reservation I have with that is we still don't know why this causes IT 
tests to fail.  They were passing > 90% of the time before this jira was 
committed.  Then they started failing.

Why would the old client threading model recover better than this background 
async model ?

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-02 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698599#comment-13698599
 ] 

stack commented on HBASE-6295:
--

[~liochon] Go ahead and change the expotential list if you think it too 
aggressive.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-02 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698174#comment-13698174
 ] 

Lars Hofhansl commented on HBASE-6295:
--

Agreed. The main difference was the block cache size anyway. With the same 
setting there 0.94 is faster than 0.95, which at this point makes sense.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-02 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698155#comment-13698155
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


[~lhofhansl] Here are the results with those settings.
FilteredScanTest 11.02 with settings against 11.07 without the settings.
RandomReadTest 939 vs 940
RandomSeekScanTest 225.9 vs 255.8
RandomWriteTest 200015 vs 21362
RandomScanWithRange10Test 27807 vs 27720
SequentialRead 2924 vs 2922

etc.

So results are barely different. I think we should/can move this discussion out 
of this JIRA ;)

[~liochon] I don't have anything running on my server now, so if you want me to 
test your patch again with specific settings, just let me know.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-01 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696843#comment-13696843
 ] 

Nicolas Liochon commented on HBASE-6295:


[~eclark] I'm currently running the tests on a real cluster with 15 nodes. I'm 
using this command
{code}
bin/hbase org.apache.hadoop.hbase.IntegrationTestsDriver -r  
IntegrationTestDataIngestSlowDeterministic
{code}

With an empty config file on the client (hence using the code defaults, as 
YCSB).


I reproduced an error, but I had this in the logs (it wasn't the last line in 
the logs):
2013-07-01 14:26:05,020 WARN  [hbase-table-pool-13-thread-1] 
client.AsyncProcess: Attempt #20/20 failed for 1 operations on server 
ip-10-191-62-44.ec2.internal,60020,1372259546637 NOT resubmitting., 
tableName=IntegrationTestDataIngestSlowDeterministic, 
location=region=IntegrationTestDataIngestSlowDeterministic,6ee8,1372688561696.f12d243ed7420921efe5fa30471c102b.,
 hostname=ip-10-191-62-44.ec2.internal,60020,1372259546637, seqNum=1

We're very aggressive with the retries at the beginning: the first retries are 
after a 100ms sleeping time, and even after the 16th retry we still wait for 
6.4s. May be it's too aggressive. [~sershe], are these back off times to be 
expected, or should I get more when I call errorsByServer.calculateBackoffTime ?


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-07-01 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696650#comment-13696650
 ] 

Nicolas Liochon commented on HBASE-6295:


bq. The bad news is that the addendum didn't fix integration tests on a real 
cluster. They still fail.
>From the logs, the number of retries is 14. In my tests, I had to go for 20 to 
>have something reliable enough (I've done most of them with 30, though). In 
>the addendum I didn't change the value hbase-common, and it seems it's the one 
>you're using... I can fix it here, but anyway it should be fixed in HBASE-8776.



> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-28 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696040#comment-13696040
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


bq.I have these set to true in hbase-site.xml:
bq.hbase.ipc.client.tcpnodelay
bq.ipc.server.tcpnodelay
bq.
bq.And these in hdfs-site.xml (so would won't need these, I think):
bq.ipc.server.tcpnodelay
bq.ipc.client.tcpnodelay


In standalone I don't have the hdfs-site.xml file... So I have put all of them 
in hbase-site.xml and restarted the tests for 0.94.9. I will publish the 
results on the mailing list.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695983#comment-13695983
 ] 

Hudson commented on HBASE-6295:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #588 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/588/])
HBASE-6295  Possible performance improvement in client batch operations: 
presplit and send in background - addendum (Revision 1497800)

 Result = FAILURE
nkeywal : 
Files : 
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTableMultiplexer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/hbase-server/src/test/resources/hbase-site.xml


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695950#comment-13695950
 ] 

Hudson commented on HBASE-6295:
---

Integrated in hbase-0.95-on-hadoop2 #153 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/153/])
HBASE-6295  Possible performance improvement in client batch operations: 
presplit and send in background - addendum (Revision 1497801)

 Result = FAILURE
nkeywal : 
Files : 
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTableMultiplexer.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/branches/0.95/hbase-server/src/test/resources/hbase-site.xml


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch, 
> hbase-ycsb-workloads Build time trend.png
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695594#comment-13695594
 ] 

Hudson commented on HBASE-6295:
---

Integrated in HBase-TRUNK #4201 (See 
[https://builds.apache.org/job/HBase-TRUNK/4201/])
HBASE-6295  Possible performance improvement in client batch operations: 
presplit and send in background - addendum (Revision 1497800)

 Result = SUCCESS
nkeywal : 
Files : 
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTableMultiplexer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/trunk/hbase-server/src/test/resources/hbase-site.xml


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-28 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695580#comment-13695580
 ] 

Hudson commented on HBASE-6295:
---

Integrated in hbase-0.95 #274 (See 
[https://builds.apache.org/job/hbase-0.95/274/])
HBASE-6295  Possible performance improvement in client batch operations: 
presplit and send in background - addendum (Revision 1497801)

 Result = SUCCESS
nkeywal : 
Files : 
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTableMultiplexer.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* /hbase/branches/0.95/hbase-server/src/test/resources/hbase-site.xml


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-28 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695498#comment-13695498
 ] 

Nicolas Liochon commented on HBASE-6295:


addendum committed.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-27 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694560#comment-13694560
 ] 

Nicolas Liochon commented on HBASE-6295:


The tests worked all the time with a setting of 30. I had issues with 14, 
that's the value in the common xml. The default in HConstant is 20. As the 
right values are worked out in another jira, I've just went for 20 in this 
patch, it seems to be ok on a small sample. I think we should get rid of the 
one in hbase-server as well...

I will commit today if there is no objections.


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694205#comment-13694205
 ] 

Elliott Clark commented on HBASE-6295:
--

oh blah never mind it's just that the fall back default wasn't changed but the 
xml was.  It really is 100ms base pause time.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694194#comment-13694194
 ] 

Elliott Clark commented on HBASE-6295:
--

This started failing before the retry tweaks went in.
And you were correct yesterday trunk is still at 1000ms as the default pause 
time.  ( 
https://github.com/apache/hbase/blob/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java#L554
 )

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694186#comment-13694186
 ] 

Sergey Shelukhin commented on HBASE-6295:
-

I think it might have been caused by retry tweaking (the thing we discussed 
tomorrow about the pause length). The pause is reduced to 100ms on trunk, while 
being 1000ms on 94, so current trunk retries are too short.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694178#comment-13694178
 ] 

Elliott Clark commented on HBASE-6295:
--

So I see this issue on a real cluster where the local conf is added to the 
classpath ahead of any jars.  How would the test settings be causing this ?

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694163#comment-13694163
 ] 

stack commented on HBASE-6295:
--

+1 on committing bug fix and upping retry count as addendum on this issue.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-26 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13694151#comment-13694151
 ] 

Nicolas Liochon commented on HBASE-6295:


For the logs, it's a bug, easy to fix. I will do it.
For the failure itself, the integration test uses a retry count of 10. This is 
not enough. If I increase to 30 it succeeds 5 times out of 5, while I've got a 
60% failure rate with a value of 10. The integration tests runs with the value 
found in hbase-server/.../test/resources, and this value was not changed by the 
various jira we had about this default value.

I will run more tests during the night, but this seems to be it.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-25 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693274#comment-13693274
 ] 

Nicolas Liochon commented on HBASE-6295:


Thanks for the alert Elliott. I will have a look tomorrow my time.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-25 Thread Elliott Clark (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13693206#comment-13693206
 ] 

Elliott Clark commented on HBASE-6295:
--

Looks like this broke integration tests.  We've failed 5 different jobs in a 
row.  Seems like the async threads don't recover if there's a chaos monkey.

Additionally there's a lot of spamming on the info level:

{code}
2013-06-25 06:54:03,749 INFO  [HBaseWriterThread_4] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6877
2013-06-25 06:54:03,750 INFO  [HBaseWriterThread_5] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6848
2013-06-25 06:54:03,750 INFO  [HBaseWriterThread_7] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6815
2013-06-25 06:54:03,750 INFO  [HBaseWriterThread_0] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6844
2013-06-25 06:54:03,750 INFO  [HBaseWriterThread_8] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6850
2013-06-25 06:54:03,750 INFO  [HBaseWriterThread_3] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6892
2013-06-25 06:54:03,750 INFO  [HBaseWriterThread_6] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6858
2013-06-25 06:54:03,750 INFO  [HBaseWriterThread_2] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6849
2013-06-25 06:54:03,750 INFO  [HBaseWriterThread_9] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6873
2013-06-25 06:54:03,751 INFO  [HBaseWriterThread_4] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6878
2013-06-25 06:54:03,751 INFO  [HBaseWriterThread_1] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6819
2013-06-25 06:54:03,751 INFO  [HBaseWriterThread_7] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6816
2013-06-25 06:54:03,751 INFO  [HBaseWriterThread_5] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6849
2013-06-25 06:54:03,751 INFO  [HBaseWriterThread_0] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6845
2013-06-25 06:54:03,751 INFO  [HBaseWriterThread_8] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6851
2013-06-25 06:54:03,752 INFO  [HBaseWriterThread_3] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6893
2013-06-25 06:54:03,752 INFO  [HBaseWriterThread_2] client.AsyncProcess: 
IntegrationTestDataIngestSlowDeterministic: Waiting for the global number of 
tasks to be equals or less than 0, currently it's 6850
{code}

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lis

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692658#comment-13692658
 ] 

Hudson commented on HBASE-6295:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #582 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/582/])
HBASE-6295  Possible performance improvement in client batch operations: 
presplit and send in background - round 2 (Revision 1496157)
HBASE-6295  Possible performance improvement in client batch operations: 
presplit and send in background (Revision 1496156)

 Result = FAILURE
nkeywal : 
Files : 
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
* 
/hbase/trunk/hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java

nkeywal : 
Files : 
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Action.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedWithDetailsException.java
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692579#comment-13692579
 ] 

Hudson commented on HBASE-6295:
---

Integrated in hbase-0.95-on-hadoop2 #147 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/147/])
HBASE-6295  Possible performance improvement in client batch operations: 
presplit and send in background (Revision 1496159)

 Result = FAILURE
nkeywal : 
Files : 
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Action.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedWithDetailsException.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
* 
/hbase/branches/0.95/hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692462#comment-13692462
 ] 

Hudson commented on HBASE-6295:
---

Integrated in hbase-0.95 #265 (See 
[https://builds.apache.org/job/hbase-0.95/265/])
HBASE-6295  Possible performance improvement in client batch operations: 
presplit and send in background (Revision 1496159)

 Result = SUCCESS
nkeywal : 
Files : 
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/Action.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncProcess.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnection.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RetriesExhaustedWithDetailsException.java
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ServerCallable.java
* 
/hbase/branches/0.95/hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestAsyncProcess.java
* 
/hbase/branches/0.95/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestHCM.java


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-24 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692258#comment-13692258
 ] 

Nicolas Liochon commented on HBASE-6295:


Committed to trunk & 0.95. I've done a lot of tests, but it's quite easy to 
break something in this area. So ping me if there is anything suspicious in the 
next days.

Thanks a lot for the reviews, and especially to Jean-Marc for all these tests.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-24 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692183#comment-13692183
 ] 

Sergey Shelukhin commented on HBASE-6295:
-

the patch looks reasonable, thanks 

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-23 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691474#comment-13691474
 ] 

Lars Hofhansl commented on HBASE-6295:
--

I have these set to true in hbase-site.xml:
hbase.ipc.client.tcpnodelay
ipc.server.tcpnodelay

And these in hdfs-site.xml (so would won't need these, I think):
ipc.server.tcpnodelay
ipc.client.tcpnodelay


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-23 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691471#comment-13691471
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


I'm running in standalone, so I don't thing Nagle has a big impact. But need to 
be verified.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-23 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691469#comment-13691469
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


I don't touch the settings. I build the distribution and start it so I can 
compare what we are distributing (default version). Do you want me to try the 
0.94 with a specific setting?

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-23 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691466#comment-13691466
 ] 

Lars Hofhansl commented on HBASE-6295:
--

Oh... And that might come from different default. 0.95/trunk disable Nagle's by 
default. 0.94 does not.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-23 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691465#comment-13691465
 ] 

Lars Hofhansl commented on HBASE-6295:
--

And the default setup? No parallel seeking enabled in storescanner?

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-23 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691463#comment-13691463
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


Used the last versions from the branchs. So I have checked out from 
http://svn.apache.org/repos/asf/hbase/branches/0.95/ for 0.95, etc. So 0.94 is 
almost the next RC ;)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-23 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691457#comment-13691457
 ] 

Lars Hofhansl commented on HBASE-6295:
--

Which version of 0.94/0.95/trunk did you use for this test, [~jmspaggi]?

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-21 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690845#comment-13690845
 ] 

Lars Hofhansl commented on HBASE-6295:
--

I had expected them to be roughly equal. I wonder what caused the improvement 
in 0.95+.
Should do that in 0.94 as well.


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-21 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690839#comment-13690839
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


My bad. I gave the random write number just above...

For randomRead:
||Test||Trunk||Nic||0.95||0.94||
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest|761449.8|738362.4|754100|1110772|

Which mean there is a 30% (roughly) improvement between 0.94 and 0.95/trunk...

Have you expected 0.94 to be faster than 0.95 for the randomReads?

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-21 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690828#comment-13690828
 ] 

Lars Hofhansl commented on HBASE-6295:
--

OK... I got confused by this line in your first table:
org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest761449.8
738362.4754100

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-21 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690705#comment-13690705
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


114272ms in trunk
100201ms in 0.94
076990ms in trunk+5295
114798ms im 0.95

So Trunk and 0.95 are about 10% slower than 0.94 (Which I have already figured 
with previous tests), however, with Nic's patch trunk is faster than all the 
other versions.

On my own cluster I'm doing almost only random writes... So I'm really looking 
forward to see this in 0.9x ;)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-21 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690693#comment-13690693
 ] 

Lars Hofhansl commented on HBASE-6295:
--

So the RandomReadTest takes ~76 (ms?) in 0.95/trunk but takes ~110 ms 
in 0.94? I wonder why this is.


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-21 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690658#comment-13690658
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


Numbers for 0.94...

||Test||0.94||
|org.apache.hadoop.hbase.PerformanceEvaluation$FilteredScanTest|543237.9|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest|1110772.6|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test|20998.3|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest|159891.1|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest|100201.9|
|org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest|38577.08|


Again, it's time. So you need to compare with the first tab I sent. If you need 
I can convert that to rows/time.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690540#comment-13690540
 ] 

Sergey Shelukhin commented on HBASE-6295:
-

can you please update RB? thanks

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-21 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690262#comment-13690262
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


0.94 tests are in progress... the should be done by 12h30 EST. I might be able 
to provide the results at that time. If there is any need to re-run the trunk 
tests with this patch, just let me know. I will most probably by a 2nd tests 
dedicated server soon to be able to run more tests for all those JIRAs ;)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-21 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690253#comment-13690253
 ] 

Nicolas Liochon commented on HBASE-6295:


This was a little bit more painful that I was expecting.
I've done 3 modifications compared to trunk.
1) It's now initialized in AsyncProcess. This avoids a cast from HConnection to 
HConnectionImpl.
2) I'm reporting an error once per location and try instead of all rows within 
a try.
3) I've changed the internal structure to a concurrentMap.

The test are in progress locally. I will push the patch after the first 
successful run.

[~sershe], could you please git it a quick review? Ideally I would like to 
commit it today, this will save me some merges and will give it more trial 
before the next .95 release candidate...

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-20 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689886#comment-13689886
 ] 

Nicolas Liochon commented on HBASE-6295:


[~sershe] Thanks for the update. Ok, will do.

[~jmspaggi] As Sergey :-). Your tests results have been pretty stable, and I 
will try not to break everything. I've also seen that some other jira need you 
as well ;-). And I'm quite interested by the results on 0.94 too.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689881#comment-13689881
 ] 

Sergey Shelukhin commented on HBASE-6295:
-

it's the feature that keeps tabs on wait time based on server to which we are 
sending the request, so for ex. in simple case, if we just did 16-second retry 
and now learn that region is on different server we don't wait for 32sec. to go 
there, but do so immediately.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689880#comment-13689880
 ] 

Sergey Shelukhin commented on HBASE-6295:
-

Probably not, it is MTTR feature rather than perf

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-20 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689877#comment-13689877
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


Will I need to re-run the tests?

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-20 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689876#comment-13689876
 ] 

Nicolas Liochon commented on HBASE-6295:


I don't fully remember what it is and I didn't remove it on purpose but 
hopefully I can put it back. Will do this tomorrow.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689867#comment-13689867
 ] 

Sergey Shelukhin commented on HBASE-6295:
-

Hmm, I just noticed this test removed usage of 
errorsByServer.calculateBackoffTime.
Can it please be put back? I have to withdraw my +1... :(

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-20 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689578#comment-13689578
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


It's time for x lines, depending of the tests it's not the same number of lines.
For RandomReadTest you need to divide by 1048576
For RandomScanWithRange100Test you need to divide by 4096
For RandomSeekScanTest you need to divide by 40960.
For RandomWriteTest you need to divide by 1048576
For SequentialWriteTest you need to divide by 1048576

This is the number of lines per ms. So multiply by 1000 to have the same 
result. Some are rows/minutes, so just adjust that.

So if you want to compare, here are the numbers in the same format as te PDF 
that I usually produce:
||Test||Trunk||Nic||0.95||
|org,apache,hadoop,hbase,PerformanceEvaluation$RandomReadTest|1377.08|1420.14|1390.50|
|org,apache,hadoop,hbase,PerformanceEvaluation$RandomScanWithRange100Test|11243.12|10992.68|10971.09|
|org,apache,hadoop,hbase,PerformanceEvaluation$RandomSeekScanTest|304.66|296.43|305.25|
|org,apache,hadoop,hbase,PerformanceEvaluation$RandomWriteTest|9176.07|13619.59|9134.09|
|org,apache,hadoop,hbase,PerformanceEvaluation$SequentialWriteTest|13592.40|42655.52|13255.12|

I already noticed the RandomWriteTest impact compared to 0.94 branch and 0.95...

I will re-run the 0.94 tests to make sure, but overall, I really think 0.95 is 
not doing as good as 0.95 for the RandomWriteTest.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-20 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689519#comment-13689519
 ] 

Nicolas Liochon commented on HBASE-6295:


Can I do 2097152 / 79 = 26500 to compare with the performances tests previously 
described in 
http://www.spaggiari.org/media/blogs/hbase/pictures/performances_20130321.pdf?

Because the performances were better previously (~35k / rows second).

Same for 2097152 / 114  = 18396 vs. ~30k

Or is it calculated differently?


Anyway, thanks a lot for all these great tests. I will commit tomorrow morning 
my time if there is no objection.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-20 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689480#comment-13689480
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


||Test||Trunk||Nic||0.95||
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest|761449.8|738362.4|754100|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test|21858.7|22356.7|22400.7|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest|13.6|138179.3|134186.7|
|org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest|114272.9|76990.3|114798.1|
|org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest|77144.275|24582.425|79107.25|

so Trunk and 0.95 are consistent, while Nic's version show a nice improvement 
on the write operations (both Random and Sequentials), and a very small 
degradation on SeekScan. Also a small improvement on RandomRead.

Do you need the IntegrationTestBigLinkedList for the 3 releases too?

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-20 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689196#comment-13689196
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


Tests crashed yesterday because of some ZK obscure reasons... So I had to 
restart it. It should be done now. I will add 0.95 on the list, and run it. 
Which mean I should have all the results this evening (EST). I will take the 
required time to provide the feedback today.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-20 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689055#comment-13689055
 ] 

Nicolas Liochon commented on HBASE-6295:


[~jmspaggi] I'm waiting for your feedback then. BTW, if you have time ( :-) ), 
publishing a comparison between the 0.95 without this patch & 0.94 might be 
useful. I'm saying this because if we have a performance degradation with the 
0.94 this patch will hide it...

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-18 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687332#comment-13687332
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


Tests are running. I might be able to post them tomorrow evening.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-18 Thread Eric Newton (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686790#comment-13686790
 ] 

Eric Newton commented on HBASE-6295:


[~nkeywal] Using the accumulo batch scanner, a single client will group 
requests by server: queries to multiple ranges on the same node will be sent in 
a single request to the node. If the number of client query threads is greater 
than the number of nodes, multiple threads may be used. The BatchWriter, 
however, will only use one thread to write to any one server.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-18 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686733#comment-13686733
 ] 

Nicolas Liochon commented on HBASE-6295:


v14 is what I'm likely to commit on .95 and .97. Feedback welcome. The whole 
tests suite ran twice without issue on my machine. JM, don't hesitate to run it 
again. It should have the same performance results, I haven't changed the 
algorithm itself.

[~ecn] Thanks a lot for the info. I looked at the code, it actually seems very 
similar. I've got one question: currently, we support to have multiple queries 
sent in parallel on the same region. By default we don't do that: we have only 
one query at a time per region. Do you do something similar? 

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-17 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686080#comment-13686080
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


Regarding the stack, I agree, INFO is to much. DEBUG or even TRACE might be 
better...

I will re-run all my tests with your next version.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch, 
> 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 
> 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-17 Thread Eric Newton (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686033#comment-13686033
 ] 

Eric Newton commented on HBASE-6295:


I just stumbled into this ticket today.  This approach to client writes is the 
same as Accumulo's BatchWriter.  It's very helpful for write-heavy loads.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch, 
> 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 
> 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-17 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685960#comment-13685960
 ] 

Nicolas Liochon commented on HBASE-6295:


Thanks a lot JM.
The log line is something I added in the patch. On trunk, we log nothing when 
we retry. I can change this to debug may be.
For the read results, it seems in line with today (as expected)
For the writes, it seems better or similar to my results, and this is great.

I'm going to do some final polishing, run all the tests locally, and I will 
commit to trunk & 0.95, hopefully before the end of the week.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch, 
> 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 
> 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-17 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685560#comment-13685560
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


Other tests seems to be consistent even if I don't get the exact same 
results... Will do some more.

bin/hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Loop 2 1 
300 /tmp/biglinkedlist 1

Trunk:
2013-06-17 08:37:08,264 INFO  [main] mapred.JobClient: Job complete: 
job_local_0006
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient: Counters: 31
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:   
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$Counts
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient: REFERENCED=600
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:   HBase Counters
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient: REMOTE_RPC_CALLS=0
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient: RPC_CALLS=609
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient: RPC_RETRIES=0
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient: 
NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient: 
NUM_SCANNER_RESTARTS=0
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient: 
MILLIS_BETWEEN_NEXTS=41071
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient: 
BYTES_IN_RESULTS=36000
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient: 
BYTES_IN_REMOTE_RESULTS=0
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient: REGIONS_SCANNED=4
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient: REMOTE_RPC_RETRIES=0
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:   File Output Format 
Counters
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient: Bytes Written=8
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:   FileSystemCounters
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient: 
FILE_BYTES_READ=5696162333
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient: 
FILE_BYTES_WRITTEN=6730223455
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:   File Input Format 
Counters
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient: Bytes Read=0
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:   Map-Reduce Framework
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient: Map output 
materialized bytes=41424
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient: Map input 
records=600
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient: Reduce shuffle 
bytes=0
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient: Spilled 
Records=39145720
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient: Map output 
bytes=39000
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient: Total committed heap 
usage (bytes)=1303552000
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient: CPU time spent (ms)=0
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient: SPLIT_RAW_BYTES=422
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient: Combine input 
records=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient: Reduce input 
records=1200
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient: Reduce input 
groups=600
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient: Combine output 
records=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient: Physical memory 
(bytes) snapshot=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient: Reduce output 
records=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient: Virtual memory 
(bytes) snapshot=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient: Map output 
records=1200
2013-06-17 08:37:08,271 INFO  [main] test.IntegrationTestBigLinkedList$Loop: 
Verify finished with succees. Total nodes=600


Nic:
2013-06-17 08:44:47,530 INFO  [main] mapred.JobClient: Job complete: 
job_local_0006
2013-06-17 08:44:47,531 INFO  [main] mapred.JobClient: Counters: 31
2013-06-17 08:44:47,531 INFO  [main] mapred.JobClient:   
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$Counts
2013-06-17 08:44:47,531 INFO  [main] mapred.JobClient: REFERENCED=600
2013-06-17 08:44:47,531 INFO  [main] mapred.JobClient:   HBase Counters
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient: REMOTE_RPC_CALLS=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient: RPC_CALLS=607
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient: RPC_RETRIES=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient: 
NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient: 
NUM_SCANNER_RESTARTS=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient: 
MILLIS_BETWEEN_NEXTS=39871
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient: 
BYTES_IN_RESULTS=36000
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient: 
BYTE

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-17 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685492#comment-13685492
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


Here are the results for your version:

org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest 728486.5
org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test 22197.4
org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 137125.4
org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest 69712
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest 24342.1

Trunk:
org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest 757343.8
org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test 21856.6
org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 134333
org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest 112591.3
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest 77897.975

Comparaison (The smaller the better):
org,apache,hadoop,hbase,PerformanceEvaluation$RandomReadTest-4%
org,apache,hadoop,hbase,PerformanceEvaluation$RandomScanWithRange100Test
2%
org,apache,hadoop,hbase,PerformanceEvaluation$RandomSeekScanTest2%
org,apache,hadoop,hbase,PerformanceEvaluation$RandomWriteTest   -62%
org,apache,hadoop,hbase,PerformanceEvaluation$SequentialWriteTest   -220%

I now have IntegrationTestBigLinkedList running and will run more. I keep you 
posted.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch, 
> 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 
> 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-17 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685466#comment-13685466
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


Results...

First, I'm getting a lot of this in the new version:

2013-06-17 02:02:01,660 INFO  [hbase-table-pool-6-thread-1] 
client.AsyncProcess: Attempt #1 failed for 1395 operations on server 
hbasetest,56046,1371448843669, resubmitting 1395, tableName=TestTable, last 
exception was: org.apache.hadoop.hbase.exceptions.NotServingRegionException: 
org.apache.hadoop.hbase.exceptions.NotServingRegionException: 
TestTable,057204,1371448911838.a2579d421e3a844ef5cc87d84219defe.
 is closing
at 
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5347)
at 
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5315)
at 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:1921)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:3954)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:3915)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3271)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:20938)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2122)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1829)


And not any on the trunk version.

RandomWriteTests on your version took an average of 69712 seconds
RandomWriteTests on trunk took an average of 112591.3 seconds

So I can see an improvement, but now need to figure if the data is correct or 
not...

I have the results for the reads too. I will extract them and post them here. I 
will also run some other tests to see if the data is correct or not...


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch, 
> 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 
> 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-16 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684919#comment-13684919
 ] 

Nicolas Liochon commented on HBASE-6295:


Thanks Stack. This build failed without a clear reason. I've built a new patch 
and triggered a build myself

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch, 
> 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 
> 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-16 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684805#comment-13684805
 ] 

stack commented on HBASE-6295:
--

[~liochon] I am not sure why precommit stops working.  I triggereed one 
manually for you: 
https://builds.apache.org/job/PreCommit-HBASE-Build/6046/console

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v1.patch, 6295.v2.patch, 
> 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 
> 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-15 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684569#comment-13684569
 ] 

Nicolas Liochon commented on HBASE-6295:


I was expecting a qa run that's never came, but if you have time please do.
The issue was in flush commit, it's fixed.
Le 15 juin 2013 17:59, "Jean-Marc Spaggiari (JIRA)"  a



> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v1.patch, 6295.v2.patch, 
> 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 
> 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-15 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684524#comment-13684524
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


Do you want me to give a try to the version 11?

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v1.patch, 6295.v2.patch, 
> 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 
> 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-14 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683493#comment-13683493
 ] 

Nicolas Liochon commented on HBASE-6295:


Thanks again, Jean-Marc. I'm going to work on a new version.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-14 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13683434#comment-13683434
 ] 

Jean-Marc Spaggiari commented on HBASE-6295:


Hi Nicolas,

Has requested, here are some performances tests for your patch.

org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest 765360.3
org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test 21109.7
org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 126617.6
org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest 1046473.4
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest 762233.175


org.apache.hadoop.hbase.PerformanceEvaluation$RandomReadTest 773127.4
org.apache.hadoop.hbase.PerformanceEvaluation$RandomScanWithRange100Test 22348.3
org.apache.hadoop.hbase.PerformanceEvaluation$RandomSeekScanTest 134876.5
org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest 115992.9
org.apache.hadoop.hbase.PerformanceEvaluation$SequentialWriteTest 78791.275


First set is with your patch applied on yesterday's trunk.
Second set is yesterday's trunk without your patch.

the reads and scans are not impacted, but the writes are negatively impacted 
with the version I tried.

Just let me know when you will be ready with your next version and I will be 
very happy test it again.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-11 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680501#comment-13680501
 ] 

Ted Yu commented on HBASE-6295:
---

Putting patch on cluster, I saw a lot of the following in the log:
{code}
2013-06-11 16:51:19,806 INFO  [HBaseWriterThread_11] client.AsyncProcess: won: 
Waiting for number of tasks to be equals or less than 0, currently it's 1
2013-06-11 16:51:19,807 INFO  [HBaseWriterThread_18] client.AsyncProcess: won: 
Waiting for number of tasks to be equals or less than 0, currently it's 1
2013-06-11 16:51:19,807 INFO  [HBaseWriterThread_15] client.AsyncProcess: won: 
Waiting for number of tasks to be equals or less than 0, currently it's 1
{code}
I think the above log should be at TRACE level.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-10 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680076#comment-13680076
 ] 

Ted Yu commented on HBASE-6295:
---

For AsyncProcess:
{code}
+   * This interface allows to keep the interface of the previous synchronous 
interface, that uses
{code}
'keep the interface of the previous synchronous interface' -> 'keep the 
previous synchronous interface'
{code}
+  public static interface AsyncProcessCallback {
{code}
The above interface can be package-private.
{code}
+boolean failure(int originalIndex, byte[] region, byte[] row, Throwable t);
+boolean retriableFailure(int originalIndex, Row row, byte[] region, 
Throwable exception);
{code}
Why the row is passed as different types in the above two methods ?
{code}
+  private void addAction(HRegionLocation loc, Action action, 
Map> actionsByServer) {
+if (loc != null) {
{code}
nit: you can use 'if (loc == null) {' to return early.

For shouldSubmit():
{code}
+locationException = new IOException("No location found, aborting 
submit for" +
+" tableName=" + Bytes.toString(tableName));
{code}
Should row key be included in the above message ?
{code}
+Map regionStatus = new HashMap();
{code}
Add comment for the meaning of the Boolean value ?


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668712#comment-13668712
 ] 

Nicolas Liochon commented on HBASE-6295:


bq. can you please file a jira for Process code de-duping/removal
I've removed the Process class already :-)
I need to look at these test errors. I haven't got them locally.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668693#comment-13668693
 ] 

Sergey Shelukhin commented on HBASE-6295:
-

patch looks reasonable so far... can you please file a jira for Process code 
de-duping/removal

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668571#comment-13668571
 ] 

Hadoop QA commented on HBASE-6295:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12585051/6295.v9.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces lines longer than 
100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed
  
org.apache.hadoop.hbase.replication.TestReplicationQueueFailover
  org.apache.hadoop.hbase.client.TestHCM

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5851//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5851//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5851//console

This message is automatically generated.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administr

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668496#comment-13668496
 ] 

Nicolas Liochon commented on HBASE-6295:


v9 contains the "done" mentioned above. The "will do" will come in a later 
version.
For cast vs. interface I don't have a strong opinion... I will see in the next 
version.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668451#comment-13668451
 ] 

stack commented on HBASE-6295:
--

Would suggest re-adding the cast rather than add new method to HConnection 
(even though I suggested using HConnection instead of HCI)?  I say this because 
the method added seems like it should not be public. 

Up to you N.  What ever makes most sense.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668441#comment-13668441
 ] 

Nicolas Liochon commented on HBASE-6295:


bq. Are the above failures because of the patch?
TestHCM it's a bad fix of an old hidden bug. I've got what I expect to be the 
right fix. I'm testing it locally right now.

bq. The new class needs a license.
done.

bq. Does AsyncProcessCallback have to be public? Is it only used inside the 
client package? If so, shut down access.
Done.

bq. We need these atomics + AtomicInteger ct = 
taskCounterPerRegion.get(encodedRegionName);? It is single threaded access 
right? Or you need it for internals?


bq. + static class MyAsyncProcess extends AsyncProcess {
It should. I will double check.
We need them because the client is monotheaded, but we receive the results, and 
resubmit in parallel.

bq. Why we need to add this, updateCachedLocations, if its internally used? It 
means you can use HConnection in more places instead of HCI?
Yes, I removed the cast to HCI (but I now have to push updateCachedLocations in 
the interface).

bq. If so, why not Result? If not and it is just generic, just R? Res confuses.
Will do.

I'm going to test it on a real cluster. I've actually tested a lot trunk with a 
1.7, it was working great globally. But right now, still with trunk, I got 
stuck. I'm also going to test the 0.95.1.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668430#comment-13668430
 ] 

stack commented on HBASE-6295:
--

Are the above failures because of the patch?

The new class needs a license.

Great class doc on the the new AsyncProcess class

Does AsyncProcessCallback have to be public?  Is it only used inside the client 
package?   If so, shut down access.

We need these atomics +AtomicInteger ct = 
taskCounterPerRegion.get(encodedRegionName);?  It is single threaded access 
right?  Or you need it for internals?

Why we need to add this, updateCachedLocations, if its internally used?  It 
means you can use HConnection in more places instead of HCI?

Is this a Result or not?

+  static class MyAsyncProcess extends AsyncProcess {

If so, why not Result?  If not and it is just generic, just R?  Res confuses.

I love the test for the new class.

Excellent.

+1 after fixing unit tests.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13668309#comment-13668309
 ] 

Hadoop QA commented on HBASE-6295:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12585022/6295.v8.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 release 
audit warnings (more than the trunk's current 0 warnings).

{color:red}-1 lineLengths{color}.  The patch introduces lines longer than 
100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.replication.TestReplicationQueueFailover
  
org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed
  org.apache.hadoop.hbase.client.TestHCM

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5848//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5848//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5848//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5848//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5848//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5848//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5848//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5848//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5848//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5848//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5848//console

This message is automatically generated.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly f

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-21 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13663266#comment-13663266
 ] 

Nicolas Liochon commented on HBASE-6295:


rb: https://reviews.apache.org/r/11318/


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-17 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661080#comment-13661080
 ] 

Nicolas Liochon commented on HBASE-6295:


bq. HConnectionImplementation which are not in HConnection
The problem i have here is the link between HTable and 
HConnectionImplementation. I don't have a nice solution.

bq. I think including tableName and row.getRow() in exception message would 
help debug.
I've done it for tableName but not for getRow as it would be quite big 
sometimes.

bq. Also, there's still large scale (hundreds of lines) copy-pasted code shared 
between AsyncProcess and Process. If we don't get rid of Process fast (and I 
suspect realistically we won't) it can become a problem. Can at least some 
shared code be made shared?
That's the big one. 'Process' is not a public class. I tried to reimplement the 
functions that use it with the Async process. The tests are not yet fine 
locally. I will push to RB once it's ok.

bq. Is it a legal condition?
It's historical. It means that someone could send a list with some nulls in the 
middle. I preferred to keep it.

bq. Since getWriteBuffer is removed and there's no way to get at this buffer.
I removed it because it was not in HTableInterface and it was an implementation 
leak. This said, everybody uses HTable directly. I put it back.

bq. Code in HTable looks very non-thread-safe, I am assuming that is ok.
yes, HTable is non threadsafe by design. The idea is to have no lock at all in 
this class (but I had to put some in AsyncProcess as there is some mt stuff 
because of the callbacks).



> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658857#comment-13658857
 ] 

Sergey Shelukhin commented on HBASE-6295:
-

Can you post it on rb?

Also, there's still large scale (hundreds of lines) copy-pasted code shared 
between AsyncProcess and Process. If we don't get rid of Process fast (and I 
suspect realistically we won't) it can become a problem. Can at least some 
shared code be made shared?

Also, patch needs a little bit of rebasing.

{code}
  private R result;
{code}
Can you please update the main comment in this file on why this is necessary.

{code}
  Row row = it.next();
  if (row != null) {
{code}
Is it a legal condition?
{code} // to move to trace, {code}
Move to trace? :)

{code}
if (LOG.isTraceEnabled() && numAttempt > 0) {
{code}
is numAttempt the number of tries, or retries? The above "> 1" would seem to 
indicate the former, but this checks >0.

{code}
if (nextLog == 0){
nextLog = EnvironmentEdgeManager.currentTimeMillis() + 3000;
  }
{code}
This can just be set before the start of the loop.

{code}
} else {
if (EnvironmentEdgeManager.currentTimeMillis() > nextLog) {
...
}
nextLog = EnvironmentEdgeManager.currentTimeMillis() + 5000;
{code}
This will update nextLog in every iteration of the loop (after the first), so 
{code}if (EnvironmentEdgeManager.currentTimeMillis() > nextLog)
{code}
will never (well, almost never, technically) become true.
Only needs to be updated when logging.

{code}
  retriedErrors = new BatchErrors();
  RetriesExhaustedWithDetailsException exception = 
errors.makeException();
  errors  = new BatchErrors();
  retriedErrors = new BatchErrors();
{code}
Why does this do resetting of retriedErrors? And twice, too.


{code}

/**
 * Methods and attributes to manage a batch process are grouped into this 
single class.
 * This allows, by creating a Process per batch process to ensure 
multithread safety.
 *
 * This code should be move to HTable once processBatchCallback is not 
supported anymore in
 * the HConnection interface.
{code}
Javadoc for new class is also copy-pasted. Can you please write javadoc that 
explains what it does?

Code in HTable looks very non-thread-safe, I am assuming that is ok.

{code}
  private HConnectionManager.HConnectionImplementation.AsyncProcess ap;
{code}
Why is there just one AsyncProcess per table? I thought it was supposed to be 
per batch request?

{code}
  ap.submit(writeAsyncBuffer);
  while (previousSize == writeAsyncBuffer.size()) {
try {
  Thread.sleep(1000);
} catch (InterruptedException e) {
  throw new InterruptedIOException("Still not sent: " + 
writeAsyncBuffer.size() + " rows.");
}
ap.submit(writeAsyncBuffer);
  }
{code}
Why does it keep submitting the same buffer again and again?

{code}
 if (!clearBufferOnFail){
if (ap.hasError()){
  ap.waitUntilDone();
  writeAsyncBuffer.addAll(ap.getFailedOperation());
}
  }
  throw
{code}
What is this for? If put calls doPut, doPut calls backgroundFlushCommits, and 
this happens and puts some stuff into writeAsyncBuffer, 
exception will be thrown outside of put. What will the data be used for inside 
the buffer?
Since getWriteBuffer is removed and there's no way to get at this buffer.

Nit: Batch.java has some whitespace added at the end.

ZKUtil has some change in deleteNodeFailSilent that look unrelated.


> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todo

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-15 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658786#comment-13658786
 ] 

Ted Yu commented on HBASE-6295:
---

{code}
+  public List getFailedOperation(){
{code}
getFailedOperation -> getFailedOperations
{code}
+  public AsyncProcess(HConnection hci, byte[] tableName, ExecutorService 
pool,
+  Batch.Callback callback){
+this.hci = (HConnectionImplementation)hci;
{code}
If methods of HConnectionImplementation which are not in HConnection are used, 
parameter hci should be declared as HConnectionImplementation.
{code}
+  public void submit(List rowList) throws IOException {
+waitForMaximumTaskNumber(maxTotalConcurrentTasks);
+
+if (!hasError()){
+  submit(rowList, 1, false);
+}
{code}
For else case, should error condition be conveyed through boolean return value 
or some exception ?
{code}
+   * @param rowList
+   * @param numAttempt
+   * @throws IOException - if we can't locate a region after multiple 
retries.
+   */
+  private void submit(List rowList, int numAttempt, boolean 
force)
{code}
Please add force parameter to javadoc. Brief explanation for the parameters 
should help.
{code}
+if (loc == null) {
+  throw new IOException("No location found, aborting submit.");
{code}
I think including tableName and row.getRow() in exception message would help 
debug.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-02 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13647695#comment-13647695
 ] 

Nicolas Liochon commented on HBASE-6295:


I've looked at the existing code, trying to remove the duplication:
- the parametization of Process is likely wrong. It seems it was coded for the 
return type used in the callback, but then was misunderstood.
- If we want to remove the existing Process class, we should remove as well 
the functions that uses them. They should be replaced with the new one, and we 
should have callbacks for errors (today, the functions are not really 
asynchronous, as the there is callback only for success).

I think it can be done in another patch (and removing the existing interface is 
another subject).

So we're close to a reviewable patch imho. 

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-01 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646465#comment-13646465
 ] 

Nicolas Liochon commented on HBASE-6295:


bq. Also, what happens to rows that are not added in AsyncProcess::submit? Not 
clear on that

[~sershe] Thanks for having a look. I wrote a short summary that I will put in 
the javadoc or in the hbase ref guide to explain what the code is supposed to 
do.
{panel} 
The puts are sent asynchronously. The interface is 100% compatible with the 
HTable interface that we had in 0.94 and before.
If autoflush is set to false, writes are buffered in HTable. When the buffer 
size goes beyond the value defined in "hbase.client.write.buffer", the buffer 
is sent asynchronously to the server. Retries will be also be managed 
independently. We block only:
- if the users code calls HTable#flushCommit
- if the user code calls HTable#close, because it implies a flushCommit
- if we run out of retries for an operation: in this case we finish all the 
writes in progress, an raise a single aggregated error.
- if we met one of the flow control condition detailed below.

It's possible to control the client stream with two parameters:
- "hbase.client.max.total.tasks": number of task that we can run 
simultaneously. If the buffer goes beyond "hbase.client.write.buffer" and the 
number of tasks currently in progress is greater then 
"hbase.client.max.total.tasks", we block until some of the tasks finishes. This 
parameter must be set accordingly with the cluster size: if there are 1000 
machines in the cluster, it may make sense to have a few thousand conccurrent 
tasks for some tables.
-  "hbase.client.max.perregion.tasks": number of tasks in progress for the same 
region. When doing a background flush, puts for a region that has already 
"hbase.client.max.perregion.tasks" or more tasks in progress are skipped, and 
remain in the HTable write buffer. They will be sent into a later background 
flush. If, when doing a background flush, all entries are skipped, we block 
until a slot becomes available.
{panel} 

Now that I wrote this, I think I have a bug in the way I manage errors and 
clearBufferOnFail: may be the write buffer should contain only failes puts. I 
will check this.

bq.  Lots of the code seems to be copied from other parts of HCM, and the 
original is not removed, will it be removed? Otherwise there's duplication.
I really don't know. The problem I have is that this API is public. So while 
it's transparent in HTable (I don't change the interface nor its contract), 
it's not the case for the methods in HConnectionManager. That's why I added 
some methods: it allows to keep the existing interface of HConnectionManager 
while adding the background flush. I thought about implementing the previous 
synchronous interface with the new asynchronous methods, but I feel it can make 
them more fragile. I don't have any real opinion here, the whole existing code 
could be refactored quite a lot. That's why the patch is not final, but I can't 
say if the final patch will/should remove the duplication.


bq. RegionTooBusyException is a new one on me (I'll work on the ugly pb message 
in another issue)
Thanks, [~saint@gmail.com]. In my tests, it seems the servers hangs at a 
point. I can stop the client and restart it, the server does not accept any new 
operation (for something like 5 minutes). I don't know if it's related to my 
changes, but it's fishy. I will do a test with a server without 6295.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait,

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-30 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13646260#comment-13646260
 ] 

Sergey Shelukhin commented on HBASE-6295:
-

There are some nits in the patch that I won't go thru for now since it's not 
complete.
Lots of the code seems to be copied from other parts of HCM, and the original 
is not removed, will it be removed? Otherwise there's duplication.

Also, what happens to rows that are not added in AsyncProcess::submit? Not 
clear on that

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645874#comment-13645874
 ] 

Hadoop QA commented on HBASE-6295:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12581202/6295.v5.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 release 
audit warnings (more than the trunk's current 0 warnings).

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestHBaseFsck
  org.apache.hadoop.hbase.replication.TestReplicationSmallTests
  org.apache.hadoop.hbase.master.cleaner.TestSnapshotFromMaster
  
org.apache.hadoop.hbase.replication.TestReplicationQueueFailover
  
org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5503//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5503//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5503//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5503//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5503//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5503//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5503//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5503//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5503//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5503//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/5503//console

This message is automatically generated.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-30 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13645801#comment-13645801
 ] 

Nicolas Liochon commented on HBASE-6295:


v5 starts to be feature complete. I think I implemented the same interface as 
of today.
It may be too early for a complete review, but feedback on the approach is 
welcome. Do I break something critical in the interface? I don't know if I will 
be able to remove the code duplication.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-29 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644701#comment-13644701
 ] 

stack commented on HBASE-6295:
--

RegionTooBusyException is a new one on me (I'll work on the ugly pb message in 
another issue)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-29 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644676#comment-13644676
 ] 

Nicolas Liochon commented on HBASE-6295:


Example of exception (with 6295 on):
{noformat}
2013-04-29 12:05:51,515 DEBUG [IPC Client (1280551684) connection to 
ip-10-68-155-141.ec2.internal/10.68.155.141:60020 from root] ipc.HBaseClient: 
IPC Client (1280551684) connection to 
ip-10-68-155-141.ec2.internal/10.68.155.141:60020 from root: got response 
header exception { exceptionClassName: 
"org.apache.hadoop.hbase.exceptions.RegionTooBusyException" stackTrace: 
"org.apache.hadoop.hbase.exceptions.RegionTooBusyException: region is 
flushing\n\tat 
org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2477)\n\tat
 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:1869)\n\tat
 
org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:3822)\n\tat
 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3237)\n\tat
 sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)\n\tat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)\n\tat
 java.lang.reflect.Method.invoke(Method.java:597)\n\tat 
org.apache.hadoop.hbase.ipc.ProtobufRpcServerEngine$Server.call(ProtobufRpcServerEngine.java:174)\n\tat
 org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1874)\n" }
2013-04-29 12:05:51,519 DEBUG [IPC Client (1280551684) connection to 
ip-10-68-155-141.ec2.internal/10.68.155.141:60020 from root] ipc.HBaseClient: 
IPC Client (1280551684) connection to 
ip-10-68-155-141.ec2.internal/10.68.155.141:60020 from root: closing ipc 
connection to ip-10-68-155-141.ec2.internal/10.68.155.141:60020: Protocol 
message tag had invalid wire type.
com.google.protobuf.InvalidProtocolBufferException: Protocol message tag had 
invalid wire type.
at 
com.google.protobuf.InvalidProtocolBufferException.invalidWireType(InvalidProtocolBufferException.java:78)
at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:498)
at 
com.google.protobuf.GeneratedMessage$Builder.parseUnknownField(GeneratedMessage.java:438)
at 
org.apache.hadoop.hbase.protobuf.generated.RPCProtos$ExceptionResponse$Builder.mergeFrom(RPCProtos.java:2225)
at 
org.apache.hadoop.hbase.protobuf.generated.RPCProtos$ExceptionResponse$Builder.mergeFrom(RPCProtos.java:2071)
at 
com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:275)
at 
org.apache.hadoop.hbase.protobuf.generated.RPCProtos$ResponseHeader$Builder.mergeFrom(RPCProtos.java:3713)
at 
org.apache.hadoop.hbase.protobuf.generated.RPCProtos$ResponseHeader$Builder.mergeFrom(RPCProtos.java:3541)
at 
com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:212)
at 
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:746)
at 
com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:238)
at 
com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:282)
at 
com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:760)
at 
com.google.protobuf.AbstractMessageLite$Builder.mergeDelimitedFrom(AbstractMessageLite.java:288)
at 
com.google.protobuf.AbstractMessage$Builder.mergeDelimitedFrom(AbstractMessage.java:752)
at 
org.apache.hadoop.hbase.protobuf.generated.RPCProtos$ResponseHeader.parseDelimitedFrom(RPCProtos.java:3498)
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.readResponse(HBaseClient.java:994)
at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:661)
2013-04-29 12:05:51,521 DEBUG [IPC Client (1280551684) connection to 
ip-10-68-155-141.ec2.internal/10.68.155.141:60020 from root] ipc.HBaseClient: 
IPC Client (1280551684) connection to 
ip-10-68-155-141.ec2.internal/10.68.155.141:60020 from root: closed
{noformat}

I'm unclear on the impact, but at a point it stops working, so it could have 
one.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxs

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-29 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13644489#comment-13644489
 ] 

Nicolas Liochon commented on HBASE-6295:


I've changed the per server task limit to a per region task limit.

Here is the result of a test on 5 datanodes, ec2 xlarge instances, table not 
split. I've executed the test multiple times on each config.
{code}
./bin/ycsb load hbase -P workloads/workloada -s  -p columnfamily=family -p 
recordcount=1 | grep -v CLEAN | grep -v INSERT | grep -v UPDATE
{code}

without 6295
{noformat}
 10 sec: 50159 operations; 5011,39 current ops/sec; [INSERT 
AverageLatency(us)=184,77] 
 20 sec: 127679 operations; 7748,9 current ops/sec; [INSERT 
AverageLatency(us)=121,76] 
 30 sec: 132239 operations; 455,77 current ops/sec; [INSERT 
AverageLatency(us)=242,47] 
 40 sec: 132239 operations; 0 current ops/sec;  
 50 sec: 132239 operations; 0 current ops/sec;  
 60 sec: 132239 operations; 0 current ops/sec;  
 70 sec: 132239 operations; 0 current ops/sec;  
 80 sec: 132239 operations; 0 current ops/sec;  
 90 sec: 167676 operations; 3532,04 current ops/sec; [INSERT 
AverageLatency(us)=1955,36] 
 100 sec: 250799 operations; 8308,98 current ops/sec; [INSERT 
AverageLatency(us)=114,35] 
 110 sec: 341999 operations; 9115,44 current ops/sec; [INSERT 
AverageLatency(us)=107,25] 
 120 sec: 428639 operations; 8660,54 current ops/sec; [INSERT 
AverageLatency(us)=109,67] 
 130 sec: 507844 operations; 7916,54 current ops/sec; [INSERT 
AverageLatency(us)=125,44] 
 140 sec: 588239 operations; 8035,48 current ops/sec; [INSERT 
AverageLatency(us)=120,31] 
 150 sec: 674879 operations; 8660,54 current ops/sec; [INSERT 
AverageLatency(us)=109,59] 
 160 sec: 729599 operations; 5469,27 current ops/sec; [INSERT 
AverageLatency(us)=111,25] 
 170 sec: 729599 operations; 0 current ops/sec;  
 180 sec: 729599 operations; 0 current ops/sec;  
 190 sec: 729599 operations; 0 current ops/sec;  
 200 sec: 729599 operations; 0 current ops/sec;  
 210 sec: 729599 operations; 0 current ops/sec;  
 220 sec: 729599 operations; 0 current ops/sec;  
 230 sec: 819076 operations; 8926,28 current ops/sec; [INSERT 
AverageLatency(us)=824,06] 
 240 sec: 911999 operations; 9287,66 current ops/sec; [INSERT 
AverageLatency(us)=102,68] 
 250 sec: 998639 operations; 8659,67 current ops/sec; [INSERT 
AverageLatency(us)=109,97] 
 260 sec: 1089839 operations; 9116,35 current ops/sec; [INSERT 
AverageLatency(us)=105,23] 
 270 sec: 1094399 operations; 455,77 current ops/sec; [INSERT 
AverageLatency(us)=219,31] 
 280 sec: 1094399 operations; 0 current ops/sec;  
 290 sec: 1094399 operations; 0 current ops/sec;  
 300 sec: 1098959 operations; 455,77 current ops/sec; [INSERT 
AverageLatency(us)=7326,11] 
 310 sec: 1098959 operations; 0 current ops/sec;  
 320 sec: 1098959 operations; 0 current ops/sec;  
 330 sec: 1131203 operations; 3222,79 current ops/sec; [INSERT 
AverageLatency(us)=1109,09] 
2013-04-29 09:06:04,357 INFO  [Thread-1] zookeeper.RecoverableZooKeeper 
(RecoverableZooKeeper.java:(119)) - The identifier of this process is 
hconnection-0xe6c
2013-04-29 09:06:04,373 INFO  
[Thread-1-SendThread(ip-10-68-135-50.ec2.internal:2181)] zookeeper.ClientCnxn 
(ClientCnxn.java:logStartConnect(966)) - Opening socket connection to server 
ip-10-68-135-50.ec2.internal/10.68.135.50:2181. Will not attempt to 
authenticate using SASL (Impossible de trouver une configuration de connexion)
2013-04-29 09:06:04,380 INFO  
[Thread-1-SendThread(ip-10-68-135-50.ec2.internal:2181)] zookeeper.ClientCnxn 
(ClientCnxn.java:primeConnection(849)) - Socket connection established to 
ip-10-68-135-50.ec2.internal/10.68.135.50:2181, initiating session
2013-04-29 09:06:04,391 INFO  
[Thread-1-SendThread(ip-10-68-135-50.ec2.internal:2181)] zookeeper.ClientCnxn 
(ClientCnxn.java:onConnected(1207)) - Session establishment complete on server 
ip-10-68-135-50.ec2.internal/10.68.135.50:2181, sessionid = 0x13e55e55e6e000e, 
negotiated timeout = 18
 340 sec: 1158239 operations; 2702,52 current ops/sec; [INSERT 
AverageLatency(us)=39,33] 
 350 sec: 1158239 operations; 0 current ops/sec;  
 360 sec: 1158239 operations; 0 current ops/sec;  
 370 sec: 1158239 operations; 0 current ops/sec;  
 380 sec: 1158239 operations; 0 current ops/sec;  
 390 sec: 1158239 operations; 0 current ops/sec;  
 400 sec: 1244879 operations; 8659,67 current ops/sec; [INSERT 
AverageLatency(us)=790,06] 
 410 sec: 1326959 operations; 8203,9 current ops/sec; [INSERT 
AverageLatency(us)=52,75] 
 420 sec: 1326959 operations; 0 current ops/sec;  
 430 sec: 1326959 operations; 0 current ops/sec;  
 440 sec: 1326959 operations; 0 current ops/sec;  
 450 sec: 1326959 operations; 0 current ops/sec;  
 460 sec: 1326959 operations; 0 current ops/sec;  
 470 sec: 1349759 operations; 2279,09 current ops/sec; [INSERT 
AverageLatency(us)=2868,24] 
 480 sec: 1527599 operations; 1777

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-26 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643087#comment-13643087
 ] 

Nicolas Liochon commented on HBASE-6295:


v4. I added a control per server: the client cannot have more than X request on 
a same server. If this number is reached, we continue for the other servers, 
but the ones on the overloaded servers are kept in the buffer. This will limit 
the rpc.timeout effect.

It's still a hack in terms on implementation, but hopefully it's acceptable in 
terms of feature. I've got some tests running locally, I will do one on a real 
cluster if they are ok.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-22 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637931#comment-13637931
 ] 

Nicolas Liochon commented on HBASE-6295:


bq. On the OOME, is that building a value to write? It is not receiving a 
response? And we are setting a bad size of response?
I don't know, I haven't reproduced it. I've done a test with 5 datanodes on 
plain trunk, it seems much stable than with 3, so I will redo my tests with 
this config.

bq. Please use EnvironmentEdgeManager.currentTimeMillis().
It's still early stage. I prefer to do some workload test before going for the 
final design. For the performances, it seems great, but as well it's seems the 
increased workload on the server triggers some bad stuff. I wonder I I should 
not add a way to limit the load sent by the client to the rs, for example a 
given number of task per regionserver.

I will do a new test tomorrow and publish a version then...

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-19 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13636635#comment-13636635
 ] 

Ted Yu commented on HBASE-6295:
---

{code}
+  private final long creationTime = System.currentTimeMillis();
...
+  final long waitingTime = delay + creationTime - 
System.currentTimeMillis();
{code}
Please use EnvironmentEdgeManager.currentTimeMillis().
{code}
-"hbase.client.write.buffer", 2097152);
+"hbase.client.write.buffer", 2097152 * 2);
{code}
Why increasing default write buffer size ?
{code}
   public void flushCommits() throws IOException {
+if (backgroundSendSize > 0){
+  backgroundFlushCommits();
{code}
Should we check currentWriteBufferSize > 0 in the if statement as well ?

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 >

1 - 100 of 122 matches

Mail list logo