subject:"\[jira\] \[Updated\] \(HBASE\-19358\) Improve the stability of splitting log when do fail over"

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2019-02-01 Thread Andrew Purtell (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-19358:
---
Fix Version/s: (was: 1.5.0)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 1.4.1, 2.0.0-beta-1, 2.0.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-branch-2-v3.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, 
> split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12905027/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12905028/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-12-12 Thread Andrew Purtell (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-19358:
---
Fix Version/s: (was: 1.3.3)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 1.4.1, 1.5.0, 2.0.0-beta-1, 2.0.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-branch-2-v3.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, 
> split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12905027/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12905028/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-12-11 Thread Andrew Purtell (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-19358:
---
Fix Version/s: 1.3.3

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
>Priority: Major
> Fix For: 1.4.1, 1.5.0, 2.0.0-beta-1, 1.3.3, 2.0.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-branch-2-v3.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, 
> split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12905027/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12905028/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-08 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19358:
--
Fix Version/s: (was: 2.0.0-beta-2)
   2.0.0-beta-1

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 1.4.1, 1.5.0, 2.0.0-beta-1
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-branch-2-v3.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, 
> split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12905027/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12905028/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-07 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12905027/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12905028/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.


  was:
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12905027/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 1.4.1, 1.5.0, 2.0.0-beta-2
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-branch-2-v3.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, 
> split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12905027/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12905028/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
>

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-07 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12905027/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.


  was:
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 1.4.1, 1.5.0, 2.0.0-beta-2
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-branch-2-v3.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, 
> split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12905027/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
>

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-07 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: split_test_result.png
split-1-log.png
split-logic-new.jpg
split-logic-old.jpg
split-table.png

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 1.4.1, 1.5.0, 2.0.0-beta-2
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-branch-2-v3.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, 
> split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-06 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358-branch-2-v3.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 1.4.1, 1.5.0, 2.0.0-beta-2
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-branch-2-v3.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-06 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: HBASE-19358-branch-2-v3.patch)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 1.4.1, 1.5.0, 2.0.0-beta-2
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-branch-2-v3.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-05 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-19358:
--
Fix Version/s: (was: 3.0.0)
   (was: 2.0.0)
   2.0.0-beta-2

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 1.4.1, 1.5.0, 2.0.0-beta-2
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-branch-2-v3.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-05 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split-table.png)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-branch-2-v3.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-05 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358-branch-2-v3.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-branch-2-v3.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-05 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split_test_result.png)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch, split-table.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-05 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split-logic-old.jpg)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch, split-table.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-05 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split-logic-new.jpg)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch, split-table.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-05 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split-1-log.png)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch, split-table.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-05 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: HBASE-18619-branch-2.patch)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch, split-1-log.png, 
> split-logic-new.jpg, split-logic-old.jpg, split-table.png, 
> split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-05 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: HBASE-18619-branch-2-v2.patch)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch, split-1-log.png, 
> split-logic-new.jpg, split-logic-old.jpg, split-table.png, 
> split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-03 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.


  was:
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-18619-branch-2-v2.patch, HBASE-18619-branch-2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch, split-1-log.png, 
> split-logic-new.jpg, split-logic-old.jpg, split-table.png, 
> split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12904507/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-03 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.


  was:
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-18619-branch-2-v2.patch, HBASE-18619-branch-2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch, split-1-log.png, 
> split-logic-new.jpg, split-logic-old.jpg, split-table.png, 
> split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12904506/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-03 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: split-table.png
split-logic-new.jpg
split-logic-old.jpg

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-18619-branch-2-v2.patch, HBASE-18619-branch-2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch, split-1-log.png, 
> split-logic-new.jpg, split-logic-old.jpg, split-table.png, 
> split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-03 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: split-1-log.png
split_test_result.png

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-18619-branch-2-v2.patch, HBASE-18619-branch-2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch, split-1-log.png, 
> split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-03 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-19358:
---
Fix Version/s: 3.0.0

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 3.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-18619-branch-2-v2.patch, HBASE-18619-branch-2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-03 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-19358:
---
   Resolution: Fixed
Fix Version/s: 1.5.0
   1.4.1
   2.0.0
   Status: Resolved  (was: Patch Available)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Fix For: 2.0.0, 1.4.1, 1.5.0
>
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-18619-branch-2-v2.patch, HBASE-18619-branch-2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-02 Thread Yu Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Li updated HBASE-19358:
--
Attachment: HBASE-18619-branch-2-v2.patch

Pushed into master and branch-1.

Reattach v2 patch for branch-2 to check HadoopQA

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-18619-branch-2-v2.patch, HBASE-18619-branch-2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-01 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358-v8.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-18619-branch-2.patch, HBASE-19358-branch-1-v2.patch, 
> HBASE-19358-branch-1-v3.patch, HBASE-19358-branch-1.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-01 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: HBASE-19358-v8.patch)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-18619-branch-2.patch, HBASE-19358-branch-1-v2.patch, 
> HBASE-19358-branch-1-v3.patch, HBASE-19358-branch-1.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-01 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-18619-branch-2-v2.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-18619-branch-2-v2.patch, 
> HBASE-18619-branch-2.patch, HBASE-19358-branch-1-v2.patch, 
> HBASE-19358-branch-1-v3.patch, HBASE-19358-branch-1.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2018-01-01 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358-branch-1-v3.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-18619-branch-2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1-v3.patch, 
> HBASE-19358-branch-1.patch, HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358-v8.patch, HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-29 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358-branch-1-v2.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-18619-branch-2.patch, 
> HBASE-19358-branch-1-v2.patch, HBASE-19358-branch-1.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-29 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: HBASE-19358-branch-1.patch)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-18619-branch-2.patch, HBASE-19358-branch-1.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-29 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358-branch-1.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-18619-branch-2.patch, HBASE-19358-branch-1.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-28 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-18619-branch-2.patch
HBASE-19358-branch-1.patch
HBASE-19358-v8.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-18619-branch-2.patch, HBASE-19358-branch-1.patch, 
> HBASE-19358-v1.patch, HBASE-19358-v4.patch, HBASE-19358-v5.patch, 
> HBASE-19358-v6.patch, HBASE-19358-v7.patch, HBASE-19358-v8.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-25 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358-v7.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358-v7.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-25 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358-v6.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358-v6.patch, HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-25 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: HBASE-19358-v5.patch)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-25 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358-v5.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-22 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358-v5.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358-v5.patch, HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-21 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split-table.png)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-21 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split-logic-new.jpg)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358.patch, split-table.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-21 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split-logic-old.jpg)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358.patch, split-table.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-21 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split_test_result.png)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358.patch, split-table.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-21 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split-1-log.png)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358.patch, split-table.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-21 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358-v4.patch

patch updated after reviews on review board.

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358-v4.patch, 
> HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, 
> split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-20 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.


  was:
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358.patch, 
> split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, split-table.png, 
> split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-20 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.


  was:
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358.patch, 
> split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, split-table.png, 
> split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-20 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: split-table.png

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358.patch, 
> split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, split-table.png, 
> split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902997/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902998/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-20 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split_test_result.png)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358.patch, 
> split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, 
> split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-20 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: split_test_result.png
split-logic-new.jpg
split-logic-old.jpg

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358.patch, 
> split-1-log.png, split-logic-new.jpg, split-logic-old.jpg, 
> split_test_result.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-20 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: split-1-log.png
split_test_result.png

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358.patch, 
> split-1-log.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-17 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358-v1.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358-v1.patch, HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-17 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split-table.png)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-17 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split_test_result.png)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-17 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split-logic-old.jpg)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-17 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split-logic-new.jpg)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-17 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: split-1-log.png)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358.patch
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-15 Thread Ted Yu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-19358:
---
Status: Patch Available  (was: Open)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, 
> split-logic-old.jpg, split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-14 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
*_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.


  was:
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
hbase.regionserver.hlog.splitlog.writer.threads we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, 
> split-logic-old.jpg, split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> *_hbase.regionserver.hlog.splitlog.writer.threads_* we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
>

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-14 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
hbase.regionserver.hlog.splitlog.writer.threads we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.


  was:
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!http://example.com/image.png!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
hbase.regionserver.hlog.splitlog.writer.threads we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, 
> split-logic-old.jpg, split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12902235/split-logic-new.jpg!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> hbase.regionserver.hlog.splitlog.writer.threads we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-14 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!http://example.com/image.png!
We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
hbase.regionserver.hlog.splitlog.writer.threads we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.


  was:
The way we splitting log now is like the following figure:

The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  

We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
hbase.regionserver.hlog.splitlog.writer.threads we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, 
> split-logic-old.jpg, split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12902234/split-logic-old.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !http://example.com/image.png!
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> hbase.regionserver.hlog.splitlog.writer.threads we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-14 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: split-logic-new.jpg
split-logic-old.jpg

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358.patch, split-1-log.png, split-logic-new.jpg, 
> split-logic-old.jpg, split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> hbase.regionserver.hlog.splitlog.writer.threads we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-14 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: HBASE-19358.patch

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: HBASE-19358.patch, split-1-log.png, split-table.png, 
> split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> hbase.regionserver.hlog.splitlog.writer.threads we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-14 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
The way we splitting log now is like the following figure:

The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region and retain it 
until the end. If the cluster is small and the number of regions per rs is 
large, it will create too many HDFS streams at the same time. Then it is prone 
to failure since each datanode need to handle too many streams.

Thus I come up with a new way to split log.  

We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, we 
will pick the largest EntryBuffer and write it to a file (close the writer 
after finish). Then after we read all entries into memory, we will start a 
writeAndCloseThreadPool, it starts a certain number of threads to write all 
buffers to files. Thus it will not create HDFS streams more than 
hbase.regionserver.hlog.splitlog.writer.threads we set.
The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.


  was:
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: split-1-log.png, split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region and retain 
> it until the end. If the cluster is small and the number of regions per rs is 
> large, it will create too many HDFS streams at the same time. Then it is 
> prone to failure since each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> We try to cache all the recovered edits, but if it exceeds the MaxHeapUsage, 
> we will pick the largest EntryBuffer and write it to a file (close the writer 
> after finish). Then after we read all entries into memory, we will start a 
> writeAndCloseThreadPool, it starts a certain number of threads to write all 
> buffers to files. Thus it will not create HDFS streams more than 
> hbase.regionserver.hlog.splitlog.writer.threads we set.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-14 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: previousLogic.jpg)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: split-1-log.png, split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-14 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: newLogic.jpg)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: split-1-log.png, split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-06 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: split-table.png

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: newLogic.jpg, previousLogic.jpg, split-1-log.png, 
> split-table.png, split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-12-05 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: split-1-log.png
split_test_result.png

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>Assignee: Jingyun Tian
> Attachments: newLogic.jpg, previousLogic.jpg, split-1-log.png, 
> split_test_result.png
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
The way we splitting log now is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.


  was:
Now the way we split log is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
> Attachments: newLogic.jpg, previousLogic.jpg
>
>
> The way we splitting log now is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
Now the way we split log is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds *_hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
*_hbase.regionserver.wal.max.splitters * the number of region the hlog 
contains_*.


  was:
Now the way we split log is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!newLogic.jpg|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
> Attachments: newLogic.jpg, previousLogic.jpg
>
>
> Now the way we split log is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds *_hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads_*, but before it is 
> *_hbase.regionserver.wal.max.splitters * the number of region the hlog 
> contains_*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
Now the way we split log is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!newLogic.jpg|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.


  was:
Now the way we split log is like the following figure:

The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!newLogic.jpg|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
> Attachments: newLogic.jpg, previousLogic.jpg
>
>
> Now the way we split log is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12899558/previousLogic.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !newLogic.jpg|thumbnail!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
> hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
Now the way we split log is like the following figure:

The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!newLogic.jpg|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.


  was:
Now the way we split log is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!newLogic.jpg|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
> Attachments: newLogic.jpg, previousLogic.jpg
>
>
> Now the way we split log is like the following figure:
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !newLogic.jpg|thumbnail!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
> hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
Now the way we split log is like the following figure:
!https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!newLogic.jpg|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.


  was:
Now the way we split log is like the following figure:
!previousLogic.jpg|thumbnail!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!newLogic.jpg|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
> Attachments: newLogic.jpg, previousLogic.jpg
>
>
> Now the way we split log is like the following figure:
> !https://issues.apache.org/jira/secure/attachment/12899557/newLogic.jpg!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !newLogic.jpg|thumbnail!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
> hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
Now the way we split log is like the following figure:
!previousLogic.jpg|thumbnail!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!newLogic.jpg|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.


  was:
Now the way we split log is like the following figure:

The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!newLogic.png|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
> Attachments: newLogic.jpg, previousLogic.jpg
>
>
> Now the way we split log is like the following figure:
> !previousLogic.jpg|thumbnail!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !newLogic.jpg|thumbnail!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
> hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: previousLogic.jpg

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
> Attachments: newLogic.jpg, previousLogic.jpg
>
>
> Now the way we split log is like the following figure:
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !newLogic.png|thumbnail!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
> hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: previoutLogic.jpg)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
> Attachments: newLogic.jpg
>
>
> Now the way we split log is like the following figure:
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !newLogic.png|thumbnail!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
> hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
Now the way we split log is like the following figure:

The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!newLogic.png|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.


  was:
Now the way we split log is like the following figure:
!previous-logic.png|thumbnail!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!newLogic.png|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
> Attachments: newLogic.jpg, previoutLogic.jpg
>
>
> Now the way we split log is like the following figure:
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !newLogic.png|thumbnail!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
> hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: newLogic.jpg
previoutLogic.jpg

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
> Attachments: newLogic.jpg, previoutLogic.jpg
>
>
> Now the way we split log is like the following figure:
> !previous-logic.png|thumbnail!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !newLogic.png|thumbnail!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
> hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: previous-logic.png)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>
> Now the way we split log is like the following figure:
> !previous-logic.png|thumbnail!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !newLogic.png|thumbnail!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
> hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: (was: newLogic.png)

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
>
> Now the way we split log is like the following figure:
> !previous-logic.png|thumbnail!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !newLogic.png|thumbnail!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
> hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Description: 
Now the way we split log is like the following figure:
!previous-logic.png|thumbnail!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!newLogic.png|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.


  was:
Now the way we split log is like the following figure:
!previous-logic.png|thumbnail!
The problem is the OutputSink will write the recovered edits during splitting 
log, which means it will create one WriterAndPath for each region. If the 
cluster is small and the number of regions per rs is large, it will create too 
many HDFS streams at the same time. Then it is prone to failure since each 
datanode need to handle too many streams.

Thus I come up with a new way to split log.  
!attachment-name.jpg|thumbnail!
We cached the recovered edits unless exceeds the memory limits we set or reach 
the end, then  we have a thread pool to do the rest things: write them to files 
and move to the destination.

The biggest benefit is we can control the number of streams we create during 
splitting log, 
it will not exceeds hbase.regionserver.wal.max.splitters * 
hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
> Attachments: newLogic.png, previous-logic.png
>
>
> Now the way we split log is like the following figure:
> !previous-logic.png|thumbnail!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !newLogic.png|thumbnail!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
> hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HBASE-19358) Improve the stability of splitting log when do fail over

2017-11-27 Thread Jingyun Tian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-19358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingyun Tian updated HBASE-19358:
-
Attachment: newLogic.png
previous-logic.png

> Improve the stability of splitting log when do fail over
> 
>
> Key: HBASE-19358
> URL: https://issues.apache.org/jira/browse/HBASE-19358
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR
>Affects Versions: 0.98.24
>Reporter: Jingyun Tian
> Attachments: newLogic.png, previous-logic.png
>
>
> Now the way we split log is like the following figure:
> !previous-logic.png|thumbnail!
> The problem is the OutputSink will write the recovered edits during splitting 
> log, which means it will create one WriterAndPath for each region. If the 
> cluster is small and the number of regions per rs is large, it will create 
> too many HDFS streams at the same time. Then it is prone to failure since 
> each datanode need to handle too many streams.
> Thus I come up with a new way to split log.  
> !attachment-name.jpg|thumbnail!
> We cached the recovered edits unless exceeds the memory limits we set or 
> reach the end, then  we have a thread pool to do the rest things: write them 
> to files and move to the destination.
> The biggest benefit is we can control the number of streams we create during 
> splitting log, 
> it will not exceeds hbase.regionserver.wal.max.splitters * 
> hbase.regionserver.hlog.splitlog.writer.threads, but before it is 
> hbase.regionserver.wal.max.splitters * the number of region the hlog contains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

81 matches

Mail list logo