[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-07 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924043#comment-13924043
 ] 

Himanshu Vashishtha commented on HBASE-10278:
-

Attached is the trunk-based first cut of providing a Writer-switch 
functionality. 

Here is a brief description of what this patch adds :
a) An additional writer, which is used in case the current writer becomes slow 
and WALSwitchPolicy agrees to kick off the switch.
b) A WALSwitchPolicy interface. A concrete policy would tell when to do the 
switch, etc based on passed params. For a start, there is one impl, 
AggressiveWALSwitchPolicy (which switches when even one sync op took more than 
the threshold time). I find it very good to test this feature (actually, it 
acts as a "chaos-monkey" for this feature, where it is switching a lot). I plan 
to have a less aggressive one (where it also takes into account last time we 
switch, and last few ops which took more than threshold time after switch).
c) A thread pool for sync ops. The SyncRunners submits a callable for the sync 
call and wait on the returned Future. 
d) SyncLatencyWatcher thread to monitor sync ops latency, and send input to the 
WALSwitchPolicy to make decision.

h4. How does it work ?
SyncRunner submits a sync call to the writer. SyncLatencyWatcher monitors the 
call duration and send input to WALSwitchPolicy. If the later decides to make 
the switch, the following sequence of events happen:
1) Set FSHLog#switching true. This blocks the RingBufferEventHandler thread in 
its onEvent method.
2) Interrupt the SyncRunner threads to unblock them from their current sync 
call, and wait till they reach a safe point.
3) Grab their Append lists (i.e., whatever they were trying to sync). 
Consolidate, and sort it. These are the "in-flight" edits we need to append to 
the new Writer.
4) Get the max SyncFuture object, and note its sequenceId. We ought to unblock 
all handlers that are waiting for all sequence <= max_syncedSequenceId, after 
switching.
5) Take the "other" writer, and append-sync these "in-flight" edits. Set the 
current writer to this writer.
6) Tell SyncRunners that switch is done, and let them take new writes (complete 
the latch)
7) Set FSHLog#switching true. 
8) Roll the old writer.

It is worthy to note that in case the sync op delay is due to a concurrent log 
roll, it doesn't switch. This avoids un-necessary switches.

I intend to add metrics for number of in-flight edits used, etc. But, the above 
patch is good for giving a sense of how it looks.

h4. Testing:
I tested it on trunk and compared it with WAL switch enable and disable mode. I 
also tested by introducing hiccups (similar approach used in the above doc).
h5. No hiccups:
1.) Trunk :
2014-03-07 06:37:45,049 INFO  wal.HLogPerformanceEvaluation 
(HLogPerformanceEvaluation.java:logBenchmarkResult(413)) - Summary: threads=10, 
iterations=100, syncInterval=10 took 212.120s 47143.129ops/s
2014-03-07 06:42:38,271 INFO  wal.HLogPerformanceEvaluation 
(HLogPerformanceEvaluation.java:logBenchmarkResult(413)) - Summary: threads=10, 
iterations=100, syncInterval=10 took 214.548s 46609.617ops/s
2014-03-07 06:47:43,457 INFO  wal.HLogPerformanceEvaluation 
(HLogPerformanceEvaluation.java:logBenchmarkResult(413)) - Summary: threads=10, 
iterations=100, syncInterval=10 took 223.635s 44715.723ops/s

2. Trunk + patch, but with switch disabled:
2014-03-07 04:54:50,451 INFO  wal.HLogPerformanceEvaluation 
(HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, 
iterations=100, syncInterval=10 took 218.036s 45863.988ops/s
2014-03-07 04:59:55,640 INFO  wal.HLogPerformanceEvaluation 
(HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, 
iterations=100, syncInterval=10 took 223.940s 44654.816ops/s
2014-03-07 05:04:56,496 INFO  wal.HLogPerformanceEvaluation 
(HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, 
iterations=100, syncInterval=10 took 219.976s 45459.504ops/s

3. Trunk + patch, switch enabled:
2014-03-07 06:12:04,946 INFO  wal.HLogPerformanceEvaluation 
(HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, 
iterations=100, syncInterval=10 took 214.938s 46525.043ops/s
2014-03-07 06:16:59,603 INFO  wal.HLogPerformanceEvaluation 
(HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, 
iterations=100, syncInterval=10 took 217.718s 45930.973ops/s
2014-03-07 06:21:48,768 INFO  wal.HLogPerformanceEvaluation 
(HLogPerformanceEvaluation.java:logBenchmarkResult(438)) - Summary: threads=10, 
iterations=100, syncInterval=10 took 216.949s 46093.781ops/s

h5. With a sleep of 2 sec after every 2k sync ops (This involved some 
instrumentation in ProtobufLogWriter).
1. Trunk:
2014-03-06 20:52:03,212 INFO  wal.HLogPerformanceEvaluation 
(HLogPerformanceEvaluation.java:logBench

[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924211#comment-13924211
 ] 

Hadoop QA commented on HBASE-10278:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633401/10278-wip-1.1.patch
  against trunk revision .
  ATTACHMENT ID: 12633401

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop1.0{color}.  The patch compiles against the hadoop 
1.0 profile.

{color:green}+1 hadoop1.1{color}.  The patch compiles against the hadoop 
1.1 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8924//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8924//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/8924//console

This message is automatically generated.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-07 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924256#comment-13924256
 ] 

Ted Yu commented on HBASE-10278:


bq. 7) Set FSHLog#switching true. 

Did you mean set to false ?

Can you put the patch on review board ?

Thanks

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-07 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924527#comment-13924527
 ] 

Himanshu Vashishtha commented on HBASE-10278:
-

Thanks Ted, and yes you are right. 
Working more on error handling; will make a rb request it once that's done. 
Stay tuned.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-10 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926171#comment-13926171
 ] 

Jonathan Hsieh commented on HBASE-10278:


It has been a few days, can you post current wip on review board? (noting that 
it is WIP?)

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-10 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926213#comment-13926213
 ] 

Himanshu Vashishtha commented on HBASE-10278:
-

Sure, I have created a WIP rb request:  https://reviews.apache.org/r/18983/


> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-10 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926453#comment-13926453
 ] 

stack commented on HBASE-10278:
---

bq. Interrupt the SyncRunner threads to unblock them from their current sync 
call, and wait till they reach a safe point.

Any issues interrupting?  I've found interrupting hdfs a PITA or rather, the 
variety of exceptions that can come up are many... its tricky figuring which 
can be caught and which not.

bq. Set FSHLog#switching true.

Every new append or is it sync must run over this new volatile?

bq. 3) Grab their Append lists (i.e., whatever they were trying to sync). 
Consolidate, and sort it. These are the "in-flight" edits we need to append to 
the new Writer.

'sort'?  We've given these items their seqid at this stage, right?  Will the 
sort mess this up?

More comments over in rb





> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13932433#comment-13932433
 ] 

stack commented on HBASE-10278:
---

Also was wondering what can we do to prevent a pipeline being set up on same 
set of disks as that of the faulty writer that we are switching off?  Anything 
we can do to hint dfsclient?

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-13 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934624#comment-13934624
 ] 

Jonathan Hsieh commented on HBASE-10278:


{quote}
* Added a thread pool which executes the writer.sync() call. SR submits a 
callable & waits on the returned Future
{quote}

So one concern is now that the common case has to go through an extra thread 
context switch.  Can we make extra context switches only happen on the rarer 
switching case?

{quote}
* Interrupt SyncRunners to unblock them from current sync call, and wait till 
all of them reach a “safepoint”.
* When all SRs are interrupted, they signal switch that they have reached the 
safepoint, and Switch process can swap the writer 
* Take all the in-flight edits from all SR(s); append-sync them in same order 
using the “reserve” writer
{quote}

So with this approach, we need to wait for interrupts to handle before we can 
start making progress using the reserve writer.

Couldn't we just take the roll writer lock, block incoming appends, take the 
list and start using the reserve writer when we've initiated the switch 
process?  We could avoid the extra thread context switches in the common case.  
We may end up with potential duplicate edits in the old log and new reserve log 
but that is ok.  instead of interrupting the new sync thread pool threads, we'd 
interrupt the syncrunner threads after we've moved to the reserve writer 
potentially after we have unblocked the ring buffer handler and even after the 
roll writer lock.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, 
> SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-14 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934926#comment-13934926
 ] 

Jean-Marc Spaggiari commented on HBASE-10278:
-

Is there any metrics to say "we have switched" that number of times, "we have 
waiting an average of" before switching, in "total operation took x ms", and 
"in total it will have taken x ms" (based on the duration of the first thread)? 
Or we do not record that for now?


> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, 
> SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-01-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863843#comment-13863843
 ] 

Sergey Shelukhin commented on HBASE-10278:
--

Skimmed the doc, looks really nice. I do think that out-of-order WAL should 
eventually become ok (we will get per-region mvcc from seqId-mvcc merge, and 
mvcc in WAL from this or several other jiras). One thing I might have missed - 
since it currently requires log rolling, would it need throttling for 
switching? If there's a long sequence of network hiccups from the machine (i.e. 
to both files), it might roll lots of tiny logs.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-01-06 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863921#comment-13863921
 ] 

ramkrishna.s.vasudevan commented on HBASE-10278:


I read this document.  Looks nice.
Few questions, to clarify if my understanding is right,
when there is log switch happening say edits from 1 ... 10 are in WAL A.  Due 
to switch the edits 11 .. 13 are in WAL B.  
Now if this above mentioned thing is to happen then the log roll for WAL A has 
to be completed by blocking all writes?  Will this be costly ? How costly will 
this be.
If the rollwriter happens and at the same time we start taking writes on WAL B 
the above mentioned scenario happens.  so in that case we may have out of order 
edits during log split if this RS crashes right ?.
Currently the assumption is there are 2 WALs per RS and only one of them is 
active.  So how do you plan to make the interface for this, in the sense do you 
have plans to extend this number 2 to something more than 2 ? If so how many of 
them will be active?
the reason am asking this is, the doc says this implementation will form the 
basis for other multi log implementations.  So if that is true, then if is say 
RS.getLog() how many logs should it return?  currently in testcases and in the 
HRS.rollWriter() the rolling happens only on one HLog.  But with multiWAL this 
may change.
I tried out some interfaces for HBASE-8610 inorder to introduce interfaces for 
multi WAL.  A very general use case would be to have MultiWAL per table.  If 
that model needs to fit in here how easy would it be with these interfaces 
introduced in this JIRA.




> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-01-07 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864017#comment-13864017
 ] 

Nicolas Liochon commented on HBASE-10278:
-

Great doc. I'm personally very comfortable with the idea of a WAL Switching 
functionality.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-01-07 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864711#comment-13864711
 ] 

Himanshu Vashishtha commented on HBASE-10278:
-

Thanks a lot for the reviews and comments guys.
[~sershe] Yes, merging of seqid-mvcc would help in relaxing the limitation but 
I think it would be good to have WAL switches compatible with 0.96/0.98 without 
any other dependency.
Yes, the switch happens when the current WAL becomes slow. It makes the new WAL 
active (doesn't need rolling of the new WAL). The new WAL takes the inflight 
edits and then starts taking newer edits. Meanwhile the slow old WAL is rolled 
in parallel. Its not taking any writes, so no throttling is required. If the 
hiccup stays for long, the new WAL might switch too. In future, we could use 
some heuristic to monitor switches (current WAL size, last switch time, etc).

[~ram_krish]:
bq. the log roll for WAL A has to be completed by blocking all writes?
WAL A is not taking any new writes at this moment, as WAL B is the active one. 
I don't see any writes blocked by A's rolling. Read the above explanation and 
let me know if I am missing anything in your question.

bq.  If the rollwriter happens and at the same time we start taking writes on 
WAL B the above mentioned scenario happens. so in that case we may have out of 
order edits during log split if this RS crashes right ?.
So, the situation is RS crashes while switching? 
I do see duplicate edits in two WALs (as in-flight edits has to be appended on 
every switch), but I don't see out-of-order edits even in this case. Could you 
please explain how you see out-of-order edits?

Re: Other implementations:
The goal is to implement WAL Switching such that other HLog implementations  
(such as per table MultiWAL) can re-use it. 
Yes, there is some refactoring required to make test classes use HLog as an 
interface (currently, they are calling 14 non-interface methods in FSHLog). 
There are methods which are implementation specific: such as rollLog, 
getNumberOfWALs, etc. A HLog client (such as Regionserver) shouldn't really 
care about that, but implementors do. I plan to do this refactoring here.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-01-07 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865014#comment-13865014
 ] 

Liang Xie commented on HBASE-10278:
---

nice design, especially considering MTTR!  +100 :)  very impressive about the 
table under "Switch Threshold vs N/W Hiccup" section!
bq. Its not taking any writes, so no throttling is required. If the hiccup 
stays for long, the new WAL might switch too. In future, we could use some 
heuristic to monitor switches (current WAL size, last switch time, etc).
To me, i think it would be better if we have a throttling config. e.g. there is 
a long rack level n/w outage, then lots of RS's log switchings will put not low 
pressure to NN. and those tiny logs seems unhappy to every hdfs ops:)

In the section "Cost of HLog Rolling’s Open / Close with data", i can't 
understand the results for "Open/1k write/Close 1000 files concurrently: ~300ms"
Results:
● Open/1k write/Close 1 file: ~340 ms (avg).
● Open/1k write/Close 1000 files concurrently: ~300ms
  ○ 4 sec; 568ops takes > 1sec (2,3,4 sec)
  ○ 56.8%tile is >1sec
the "~300ms" is avg ??? why it's smaller than "1 file" scenario ?

could you guys add more detailed stuff about how to handle the 2+ opening log 
files on replication path ?

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-01-10 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868551#comment-13868551
 ] 

Himanshu Vashishtha commented on HBASE-10278:
-

Thanks for reviewing the doc Liang (and sorry about this delay in replying).

True, to handle longer outages (rack down, for e.g.), we could tune the 
switching policy to avoid tiny log files (for e.g., take number of append ops 
since last switched, etc).

Yes, 300ms is the avg time (total time for 1k ops was about 30sec). I didn't 
really dig into it to know the why it is better than as compared to 1 file 
scenario, but for me the interesting bit was about 568/1000 ops took more than 
a sec.

Yes, The replication needs to handle two opened files. To get minimal impact on 
Replication, I am thinking of adding a separate ReplicationSource thread for 
the second WAL. But, I still need to look into it more if there is a better way 
to achieve this.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-01-10 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868556#comment-13868556
 ] 

Himanshu Vashishtha commented on HBASE-10278:
-

As mentioned in the doc, I will work on this feature on a different branch and 
merge it in the trunk when it is ready. 
I have created branch at my github 
(https://github.com/HimanshuVashishtha/hbase/tree/HBASE-10278).

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: Multiwaldesigndoc.pdf
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-12-15 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247385#comment-14247385
 ] 

Sean Busbey commented on HBASE-10278:
-

Attaching a link to the 0.89-fb approach to the general problem of slow 
pipelines, which was to request a roll on slow syncs. (ref [commit 
ae31cc53|https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=commit;h=ae31cc53050bdf656bf16f094c0b066eb5a0fc0e]

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>  Components: wal
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.1.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-12-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249181#comment-14249181
 ] 

stack commented on HBASE-10278:
---

[~busbey] Could do the 0.89fb tack first, before this.

How would you implement this in new regime [~busbey]? It changes FSHLog.  You'd 
do a derivative or decorated FSHLog, SwitchingFSHLog?  Can get rid of stuff 
like the 'enabled' flag and checks.

Looking at last patch, pity we couldn't switch to the other writer when rolling 
log on current writer (given rolling takes a while).  Looks like this was a 
consideration: "2241   * NOTE: Don't switch if there is a ongoing log 
roll. Most likely, this could be a redundant
2242   * step."

Patch is worth a study. Pity has to be a syncmonitor but not sure how else 
you'd do it.

On keeping around edits, one implementation, rather than append the WAL 
directly as we do  now, instead, kept the edits in a single list and then did 
bulk appends (IIRC, no advantage doing bulk append over single appends). Edits 
stayed in List until syncs came back to say it was ok let them go.  IIRC, it 
was not that much slower.  It was a little more involved (was easier just doing 
the WAL append  immediately since then we were done) but it might be worth 
considering having a single list of all outstanding edits on other side of the 
ring buffer as store for edits in flight (downside would be extra thread 
coordination)


> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>  Components: wal
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.1.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942083#comment-13942083
 ] 

stack commented on HBASE-10278:
---

Would be cool if we didn't have to do 20% more context switches for the 
'normal' case.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, 
> SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-20 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942104#comment-13942104
 ] 

Himanshu Vashishtha commented on HBASE-10278:
-

Yes, I agree. Adding even one level of indirection increases the number of
context switches, as seen in the last set of experiments.

I changed the model so the SRs does the syncing directly as it does now.
(No extra  pool). So, there is no extra context switch for 'normal' case.

When things go bad for the current pipeline, SwitchMonitor does the syncing
and interrupt these existing SR's and replaces them with new SR's.

We switch when things go bad, and it is OK to add little cost when we are
switching because 1) it is rare case in comparison to regular hdfs-sync
calls, 2) it occurs when things are bad from HDFS pipeline point of view,
and cost of adding new SRs is very less as compare to bad pipeline.

On other hand, things become interesting since SFs are reusable objects,
(it is tied to a rs-handler), AND, both SwitchMonitor and SyncRunner race
to mark the SF as done. And, as soon as it is done, the handler put them
back in the RB. I am now working on to handle this exact case, rest looks
ok.





> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, 
> SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-20 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942257#comment-13942257
 ] 

Jonathan Hsieh commented on HBASE-10278:


Metrics would be great and could be done in a critica/mustdo follow-on patch 

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, 
> SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-21 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943504#comment-13943504
 ] 

Himanshu Vashishtha commented on HBASE-10278:
-

Attached is a patch based on the new model I mentioned in my last comment.

h3. Overall design:
1. During FSHLog instantiation, open a reserved writer.
2. The SyncRunners does the sync to the file system as they do now. They 
registers to a map before calling the 'inflight-sync-ops' map, and unregisters 
themselves when done.
3. A monitoring thread, SyncOpsMonitor periodically iterates over the 
inflight-sync-ops map, and feed the start time of a sync op to the configured 
WALSwitchPolicy.
4. If switch policy decides to make the switch, it goes through steps mentioned 
in "WAL Switch workflow".
5. If there is a concurrent log roll going on, then we ignore the switch 
request.

h4. WAL Switch workflow
1. Grab the roll writer lock to ensure there is no concurrent log roll. It 
would do the roll after switching.
2. Block the processing of RingBufferHandler and let it reaches a 'safe point'. 
A safe point is just a marker to tell that the RingBuffer is blocked at this 
sequence Id. Let that sequence Id be 'X'.
3. Take the 'inflight' WALEdits (called as Appends ops), and SyncFutures from 
all the SyncRunners, and also from the RingBufferHandler (the later could be in 
the process of forming a SyncFuture batch while appending WALEdits). Ensure the 
ordering of Append ops. Ignore the SyncFutures with sequence Id > 'X'.
4. Use the reserved writer to append-sync these inflight edits.
5. Swap the writer with the reserved writer. 
6. Release all sync futures (to free up the handlers), and recreate 
SyncRunners, and interrupt the old SyncRunners.
7. Release the RingBuffer and starts the normal processing. 
8. Roll the old writer.
9. Release the rollWriter lock.


h3. Testing
I tested the HLogPE with trunk on a 5 node cluster, running hadoop2.2. 
{code}
On trunk:
Performance counter stats for 
'/home/himanshu/dists/hbase-0.99.0-SNAPSHOT/bin/hbase 
org.apache.hadoop.hbase.regionserver.w
al.HLogPerformanceEvaluation -iterations 100 -threads 10':

1891960.295558 task-clock#2.396 CPUs utilized
55,076,890 context-switches  #0.029 M/sec
 1,770,901 CPU-migrations#0.936 K/sec
73,650 page-faults   #0.039 K/sec
 2,853,602,378,588 cycles#1.508 GHz 
[83.32%]
 2,126,410,331,760 stalled-cycles-frontend   #   74.52% frontend cycles idle
[83.31%]
 1,274,582,986,073 stalled-cycles-backend#   44.67% backend  cycles idle
[66.72%]
 1,511,777,502,744 instructions  #0.53  insns per cycle
 #1.41  stalled cycles per insn 
[83.37%]
   264,303,859,957 branches  #  139.698 M/sec   
[83.33%]
 7,946,652,758 branch-misses #3.01% of all branches 
[83.33%]

 789.767027189 seconds time elapsed

Trunk + patch, with switch threshold = 1 sec.
 Performance counter stats for 
'/home/himanshu/10278-patch/hbase-0.99.0-SNAPSHOT/bin/hbase 
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -iterations 
100 -threads 10':

1937313.168376 task-clock#2.450 CPUs utilized
54,774,802 context-switches  #0.028 M/sec
 1,981,573 CPU-migrations#0.001 M/sec
63,150 page-faults   #0.033 K/sec
 2,967,414,126,620 cycles#1.532 GHz 
[83.33%]
 2,198,851,794,211 stalled-cycles-frontend   #   74.10% frontend cycles idle
[83.33%]
 1,394,951,252,428 stalled-cycles-backend#   47.01% backend  cycles idle
[66.68%]
 1,627,172,938,178 instructions  #0.55  insns per cycle
 #1.35  stalled cycles per insn 
[83.36%]
   279,686,885,670 branches  #  144.368 M/sec   
[83.34%]
 8,362,175,551 branch-misses #2.99% of all branches 
[83.32%]

 790.709682812 seconds time elapsed



Trunk  + patch , with switch threshold 100ms.
 Performance counter stats for 
'/home/himanshu/10278-patch/hbase-0.99.0-SNAPSHOT/bin/hbase 
org.apache.hadoop.hbase.regionserver.wal.HLogPerformanceEvaluation -iterations 
100 -threads 10':

1926591.375141 task-clock#2.416 CPUs utilized
55,231,306 context-switches  #0.029 M/sec
 1,996,458 CPU-migrations#0.001 M/sec
62,600 page-faults   #0.032 K/sec
 2,938,081,049,913 cycles#1.525 GHz 
[83.34%]
 2,174,078,968,852 stalled-cycles-frontend   #   74.00% frontend cycles idle
[83.31%]
 1,385,993,249,374

[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-21 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943529#comment-13943529
 ] 

Himanshu Vashishtha commented on HBASE-10278:
-

I missed license in one of the Test class. Adding it.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.0.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943535#comment-13943535
 ] 

stack commented on HBASE-10278:
---

bq.  Anything we can do to hint dfsclient?

Anything on the above?

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.0.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-21 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943570#comment-13943570
 ] 

Himanshu Vashishtha commented on HBASE-10278:
-

I missed this comment, sorry. Let me take a look at the nkeywal stuff you 
mentioned. Thanks.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.0.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13943760#comment-13943760
 ] 

Hadoop QA commented on HBASE-10278:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12636108/10278-trunk-v2.1.patch
  against trunk revision .
  ATTACHMENT ID: 12636108

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 10 new 
or modified tests.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnDatanodeDeath(TestLogRolling.java:368)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9071//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9071//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9071//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9071//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9071//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9071//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9071//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9071//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9071//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9071//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/9071//console

This message is automatically generated.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.1.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-24 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946158#comment-13946158
 ] 

Jonathan Hsieh commented on HBASE-10278:


The numbers look promising and the implementation and docs are much better on 
this go.  More comments on the review board.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.1.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-03-27 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950135#comment-13950135
 ] 

stack commented on HBASE-10278:
---

Is the failure related?

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.1.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-04-17 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972910#comment-13972910
 ] 

Jonathan Hsieh commented on HBASE-10278:


I'm going to pickup work on this issue.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.1.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-04-17 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972922#comment-13972922
 ] 

Jonathan Hsieh commented on HBASE-10278:


[~saint@gmail.com] I believe it is not related.



> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.1.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-04-17 Thread Himanshu Vashishtha (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972937#comment-13972937
 ] 

Himanshu Vashishtha commented on HBASE-10278:
-

Jon, Thanks for chiming in, but I am working on the core functionality here.
If you are interested in helping, I would appreciate if you can pick on the 
related tasks (as mentioned in the design doc).

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.1.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-04-24 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979439#comment-13979439
 ] 

Jonathan Hsieh commented on HBASE-10278:


Currently trunk has some correctness problems when running the ITMTTR (having 
to do with killing the rs hosting meta).

However running trunk and a modified version with this patch applied and on by 
default we see significantly worse recovery time when the target RS is killed. 

{code}
kill master kill rs 
move regions
AVE,STD 99  AVE,STD 99.99   AVE,STD 99  AVE,STD 99.99   
AVE,STD 99  AVE,STD 99.99
mhlog   admin   18302.5 122.8   18302.5 122.8   2.1 0.3 51.910.4
2.0 0.0 34.27.6
put 5.1 0.3 117.9   95.35.1 0.3 37647.2 9888.0  
5.6 0.5 169.4   30.3
scan2.0 0.0 24.115.73.7 0.9 36131.2 13245.5 
2.0 0.0 45.214.7
trunk   admin   18557.3 357.2   18557.3 357.2   2.1 0.3 41.87.2 
2.0 0.0 31.74.7
put 5.4 0.6 79.492.05.1 0.3 735.2   673.7   
5.0 0.0 130.4   13.9
scan2.0 0.0 27.015.82.4 0.7 165.7   138.3   
2.0 0.5 39.99.4
{code}

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.1.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-04-24 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980454#comment-13980454
 ] 

stack commented on HBASE-10278:
---

Tell us more [~jmhsieh].  Is mhlog this patch?  It gets stuck scanning? Why is 
99 and 99.99 < AVE,STD.

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.1.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2014-04-25 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980752#comment-13980752
 ] 

Jonathan Hsieh commented on HBASE-10278:


mhlog is with the multihlog patch in (on a trunk branch from last week) and on 
by default.  trunk was yesterday's trunk.

The two numbers are the average and stdev of 10 of the same consecutive runs. 
99 is for the 99%tile, and 99.99 is for the 99.99%tile as reported by ITMTTR.

So to interpret the most interesting results with the patch has a higher 
percentage that are adversely affected with high latencies.  The 99%tile 
latency for a puts when killing an RS using multihlog is 5.1ms, with a stddev 
of 0.3ms.  The 99.99%tile is 37647.2ms average with a 9888.0ms stddev.  using 
the old log it is 735.2 and 673.7 ms respetively.  Next time i run this I'll 
also get the worst cases, my guess is that they are the same or there is some 
constant extra latency due to two wals in the multilog case.




> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>Reporter: Himanshu Vashishtha
>Assignee: Himanshu Vashishtha
> Attachments: 10278-trunk-v2.1.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10278) Provide better write predictability

2023-09-13 Thread Ranganath Govardhanagiri (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-10278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764997#comment-17764997
 ] 

Ranganath Govardhanagiri commented on HBASE-10278:
--

Hello [~apurtell] - Can you please provide a pointer to the more recent work 
(JIRA)

> Provide better write predictability
> ---
>
> Key: HBASE-10278
> URL: https://issues.apache.org/jira/browse/HBASE-10278
> Project: HBase
>  Issue Type: New Feature
>  Components: wal
>Reporter: Himanshu Vashishtha
>Priority: Major
> Attachments: 10278-trunk-v2.1.patch, 10278-trunk-v2.1.patch, 
> 10278-wip-1.1.patch, Multiwaldesigndoc.pdf, SwitchWriterFlow.pptx
>
>
> Currently, HBase has one WAL per region server. 
> Whenever there is any latency in the write pipeline (due to whatever reasons 
> such as n/w blip, a node in the pipeline having a bad disk, etc), the overall 
> write latency suffers. 
> Jonathan Hsieh and I analyzed various approaches to tackle this issue. We 
> also looked at HBASE-5699, which talks about adding concurrent multi WALs. 
> Along with performance numbers, we also focussed on design simplicity, 
> minimum impact on MTTR & Replication, and compatibility with 0.96 and 0.98. 
> Considering all these parameters, we propose a new HLog implementation with 
> WAL Switching functionality.
> Please find attached the design doc for the same. It introduces the WAL 
> Switching feature, and experiments/results of a prototype implementation, 
> showing the benefits of this feature.
> The second goal of this work is to serve as a building block for concurrent 
> multiple WALs feature.
> Please review the doc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)