[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075436#comment-16075436 ] stack commented on HBASE-15131: --- Unscheduling issue not being worked on. > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040500#comment-16040500 ] Yu Li commented on HBASE-15131: --- Actually our current max log limitation logic is incorrect with multiwal. The calculation is {{(globalMemstoreSize * 2) / logRollSize}} but at WAL level inside {{AbstractFSWAL}}, while with multiwal this should be a global limit. We should also make it work with grouping strategy which has unfixed group number such as IdentityGroupingStrategy. It's not a straight-forward change so let me open another JIRA to address it. In production, we're using BoundedGroupingStrategy and manually set the max log number for each group, so no issue occurs. > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040404#comment-16040404 ] Anoop Sam John commented on HBASE-15131: Ya the config seems deprecated. But still we honor it if configured. Or else we have our own math to determine the max WAL#. But the limit is there as such > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16039395#comment-16039395 ] Enis Soztutar commented on HBASE-15131: --- bq. The #WALs having some limit so that the data size to replay on a recovery is not too much. So when we have multi WAL, we should take that in to account and should reduce the # WAL files per group accordingly? Say in 2 WALs, per group 16 max? I thought that we have removed that limit. > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038695#comment-16038695 ] Anoop Sam John commented on HBASE-15131: In a mail chain in user@ Yu Li says {quote} By default it's 32 which means for each RS the un-archived wal number shouldn't exceed 32. However, when multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals allowed for a single RS. {quote} The #WALs having some limit so that the data size to replay on a recovery is not too much. So when we have multi WAL, we should take that in to account and should reduce the # WAL files per group accordingly? Say in 2 WALs, per group 16 max? > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974093#comment-15974093 ] stack commented on HBASE-15131: --- No dependency true. Was just expressing what I thought more important but you can think multiwal more important... no harm [~carp84] > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974038#comment-15974038 ] Yu Li commented on HBASE-15131: --- I think they are two different dimensions: singleWAL v.s. multiWAL, and syncWAL v.s. asyncWAL. We have been living with syncWAL and multiWAL online for over two years and it proves to be stable, also some performance number (both benchmark and online) in HBASE-14457. And yes, would be better with asyncWAL, but no dependency between these two I guess? (smile) > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974029#comment-15974029 ] Duo Zhang commented on HBASE-15131: --- Yeah this could be done together with the async wal test. > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974017#comment-15974017 ] stack commented on HBASE-15131: --- Oh, you want to see if this will help you... Maybe [~busbey] has his more recent results? > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974016#comment-15974016 ] stack commented on HBASE-15131: --- Not sure [~carp84] I'd rather work on the async dfs client-based WAL implementaiton try and figure why that isn't going fast First. > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974006#comment-15974006 ] Yu Li commented on HBASE-15131: --- Should we revive this thread for 2.0.0? [~stack] [~Apache9] could you also take a look here? Thanks. > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337282#comment-15337282 ] Mikhail Antonov commented on HBASE-15131: - Kicked out of 1.3. > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0, 1.4.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107924#comment-15107924 ] Yu Li commented on HBASE-15131: --- Great, look forward to your numbers sir :-) > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0, 1.3.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107923#comment-15107923 ] Yu Li commented on HBASE-15131: --- bq. IIRC, the benchmarks from HBASE-14457 used a custom backport on top of 0.98. You are right [~busbey]. But recently we're preparing to upgrade our cluster to 1.1.2 and have just done a comparison test between 0.98.12 and 1.1.2 with 4 WALs, the result shows a matchable performance, JFYI. And [~eclark] may have more experience of using multiple WAL in facebook. [~eclark] could you share with us? Many thanks :-) > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0, 1.3.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107921#comment-15107921 ] Sean Busbey commented on HBASE-15131: - I have some more recent results, both for multiwal settings by themselves and in conjunction with ALL_SSD for hte WALs. Let me try to get them cleaned up an sharable. > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0, 1.3.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107918#comment-15107918 ] Yu Li commented on HBASE-15131: --- Thanks for bring this up [~enis]. bq. Using bounded provider with default 4 (assuming 12 disks). Should we also look at the number of disks from datanode dirs and auto-configure? FWIW, in our production cluster (for Alibaba search engine) each machine has 3 SATA-SSD plus 9 SATA disks, and we did a comparison test with storage type of the WALs directory set to ALL_SSD, the result is (in brief): ||WAL number||AverageLatency|| |2|4.4ms| |4|4.2ms| |8|4.4ms| So we chose 4 as the best suit As for pure SATA environment, we didn't do the comparison test against WAL number. However, [~busbey] has done a great work on comparing perf result with different WAL number in HBASE-5699 and I'd like to quote some here: {quote} INCREASING NUMBER OF PIPELINES If you look at each of the {{HBASE-5699_write_iops_multiwal-X_10,50,120,190,260,330,400_threads.tiff}} charts, as we ramp up the number of writers we manage to push more overall activity through the cluster. It's not a linear gain because splitting out the pipelines means that we do more overall syncs since fewer of them get obviated by our sync grouping. In this test, expanding from 2 to 4 or 6 pipelines didn't provide much benefit because at up to 400 concurrent sync-heavy writers we just get to maxing out the number of iops that can be done with 2 pipelines. {quote} Since we are still using hflush and pipeline write, I think Sean's result still applies. > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0, 1.3.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15131) Turn on multi-WAL by default
[ https://issues.apache.org/jira/browse/HBASE-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107862#comment-15107862 ] Sean Busbey commented on HBASE-15131: - I'd like to see some perf numbers on a shipped version of HBase before we make this the default. IIRC, the benchmarks from HBASE-14457 used a custom backport on top of 0.98. I have some numbers from a small test cluster, but I'm not sure how useful such a small scale is. > Turn on multi-WAL by default > > > Key: HBASE-15131 > URL: https://issues.apache.org/jira/browse/HBASE-15131 > Project: HBase > Issue Type: New Feature > Components: wal >Reporter: Enis Soztutar > Fix For: 2.0.0, 1.3.0 > > > Something to discuss for 2.0 or even 1.3 or 1.4. Should we turn on multi-WAL > by default now that it has seen some production use. > Most of the known issues has been fixed I believe for replication, metrics > etc. See HBASE-14457. > Using bounded provider with default 4 (assuming 12 disks). Should we also > look at the number of disks from datanode dirs and auto-configure? -- This message was sent by Atlassian JIRA (v6.3.4#6332)