[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2012-07-02 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6295:
-

Component/s: performance
   Tags: noob
 Labels: noob  (was: )

Making it noob.  Could be fun project for someone who wants to do something a 
little involved.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: client, performance
>Affects Versions: 0.96.0
>Reporter: nkeywal
>  Labels: noob
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-14 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Open  (was: Patch Available)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-14 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v11.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v1.patch, 6295.v2.patch, 
> 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 
> 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-14 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Patch Available  (was: Open)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v1.patch, 6295.v2.patch, 
> 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 
> 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-16 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Patch Available  (was: Open)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch, 
> 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 
> 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-16 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v12.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch, 
> 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 
> 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-16 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Open  (was: Patch Available)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch, 
> 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 
> 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-18 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Open  (was: Patch Available)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-18 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v14.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-18 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Patch Available  (was: Open)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 
> 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-21 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v15.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-24 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

   Resolution: Fixed
Fix Version/s: 0.95.2
 Release Note: The puts are now streamed, i.e. sent asynchronously to the 
region servers if autoflush it set to false. If a region server is slow or does 
not respond, its puts are kept into the write buffer while the others are sent 
to these respective region server, until the write buffer is full. This feature 
is keeps the semantic of the interface already existing in 0.94 when using 
autoflush.
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 
> 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 
> 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-27 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.addendum.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0, 0.95.2
>
> Attachments: 6295.addendum.patch, 6295.v11.patch, 6295.v12.patch, 
> 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-06-28 Thread Elliott Clark (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elliott Clark updated HBASE-6295:
-

Attachment: hbase-ycsb-workloads Build time trend.png

So the good news is this patch really helped our perf on YCSB.
Build #91 is where this patch went in.

The bad news is that the addendum didn't fix integration tests on a real 
cluster.  They still fail.

{code}
2013-06-28 13:08:32,882 DEBUG [hbase-table-pool-175-thread-2] 
client.AsyncProcess: Attempt #10/14 failed for 1 operations on server 
a1806.halxg.cloudera.com,60020,1372449350703, resubmitting 1, 
tableName=IntegrationTestDataIngestSlowDeterministic, 
location=region=IntegrationTestDataIngestSlowDeterministic,bbb0,1372449851893.b7040e878147caf1f6de338faad504be.,
 hostname=a1806.halxg.cloudera.com,60020,1372449350703, seqNum=1, last 
exception was: org.apache.hadoop.hbase.exceptions.NotServingRegionException: 
org.apache.hadoop.hbase.exceptions.NotServingRegionException: Region is not 
online: b7040e878147caf1f6de338faad504be
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2565)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3852)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3188)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:20938)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2122)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1829)
 - sleeping 3205 ms.
2013-06-28 13:08:32,939 DEBUG [hbase-table-pool-173-thread-1] 
client.ClientScanner: Scan table=.META., 
startRow=IntegrationTestDataIngestSlowDeterministic,2220,00
2013-06-28 13:08:32,945 DEBUG [hbase-table-pool-173-thread-2] 
client.AsyncProcess: Attempt #10/14 failed for 1 operations on server 
a1806.halxg.cloudera.com,60020,1372449350703, resubmitting 1, 
tableName=IntegrationTestDataIngestSlowDeterministic, 
location=region=IntegrationTestDataIngestSlowDeterministic,2220,1372449851892.c8a2a77e0690901df245ac9fff088a0e.,
 hostname=a1806.halxg.cloudera.com,60020,1372449350703, seqNum=1, last 
exception was: org.apache.hadoop.hbase.exceptions.NotServingRegionException: 
org.apache.hadoop.hbase.exceptions.NotServingRegionException: Region is not 
online: c8a2a77e0690901df245ac9fff088a0e
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2565)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3852)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3188)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:20938)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2122)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1829)
 - sleeping 3201 ms.
2013-06-28 13:08:32,996 DEBUG [hbase-table-pool-174-thread-1] 
client.ClientScanner: Scan table=.META., 
startRow=IntegrationTestDataIngestSlowDeterministic,bbb0,00
2013-06-28 13:08:33,030 DEBUG [hbase-table-pool-174-thread-1] 
client.ClientScanner: Finished region={ENCODED => 1028785192, NAME => 
'.META.,,1', STARTKEY => '', ENDKEY => ''}
2013-06-28 13:08:33,032 DEBUG [hbase-table-pool-174-thread-2] 
client.AsyncProcess: Attempt #10/14 failed for 1 operations on server 
a1806.halxg.cloudera.com,60020,1372449350703, resubmitting 1, 
tableName=IntegrationTestDataIngestSlowDeterministic, 
location=region=IntegrationTestDataIngestSlowDeterministic,bbb0,1372449851893.b7040e878147caf1f6de338faad504be.,
 hostname=a1806.halxg.cloudera.com,60020,1372449350703, seqNum=1, last 
exception was: org.apache.hadoop.hbase.exceptions.NotServingRegionException: 
org.apache.hadoop.hbase.exceptions.NotServingRegionException: Region is not 
online: b7040e878147caf1f6de338faad504be
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2565)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3852)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3188)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:20938)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2122)
at 
org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1829)
 - sleeping 3201 ms.
2013-06-28 13:08:33,371 DEBUG [hbase-table-pool-177-thread-2] 
client.ClientScanner: Scan table=.META., 
startRow=IntegrationTestDataIngestSlowDeterministic,b328,00
2013-06-28 13:08:33,408 DEBUG [hbase-table-pool-177-thread-2] 
client.

[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-26 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v4.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-30 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Open  (was: Patch Available)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-30 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v5.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-30 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Patch Available  (was: Open)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-02 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v6.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-02 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Open  (was: Patch Available)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Fix Version/s: 0.98.0
   Status: Patch Available  (was: Open)

local tests ok. put on rb as well.

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v8.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Patch Available  (was: Open)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v9.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-05-28 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Open  (was: Patch Available)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Fix For: 0.98.0
>
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 
> 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-10 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Assignee: Nicolas Liochon

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-10 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v1.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-10 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Patch Available  (was: Open)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-11 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Open  (was: Patch Available)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-11 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v2.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-11 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Patch Available  (was: Open)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Open  (was: Patch Available)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Attachment: 6295.v3.patch

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background

2013-04-12 Thread Nicolas Liochon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Liochon updated HBASE-6295:
---

Status: Patch Available  (was: Open)

> Possible performance improvement in client batch operations: presplit and 
> send in background
> 
>
> Key: HBASE-6295
> URL: https://issues.apache.org/jira/browse/HBASE-6295
> Project: HBase
>  Issue Type: Improvement
>  Components: Client, Performance
>Affects Versions: 0.95.2
>Reporter: Nicolas Liochon
>Assignee: Nicolas Liochon
>  Labels: noob
> Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List{
>   add o to todolist
>   if todolist > maxsize or o last in list
> split todolist per location
> send split lists to region servers
> clear todolist
> wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is 
> enough data for a single location
> It would be:
> {noformat}
> for Operation o: List{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
> send location.todolist to region server 
> clear location.todolist
> // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be 
> shared with the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira