[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851905#comment-15851905 ] Ismael Juma commented on KAFKA-4614: Thanks [~thetaphi]. We are aware of the issue and we are tracking it in KAFKA-4501. You will notice that we have a link to the Lucene fix there already. :) > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > Labels: latency, performance > Fix For: 0.10.2.0 > > Attachments: kafka-produce-99th.png > > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs] > 0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs] > 0x811affbc : clear_inode+0xac/0x140 [kernel] > 0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel] > 0x811b0815 : generic_drop_inode+0x65/0x80 [kernel] > 0x811af662 : iput+0x62/0x70 [kernel] > 0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so] > 0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 > [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so] > 0x7ff269a1307e > {code} > We took a jstack dump of the broker process and found that tid = 126123 > corresponds to the thread which is for GC(hex(tid) == nid(Native Thread ID)): > {code} > $ grep 0x1ecab /tmp/jstack.dump > "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7ff278d0c800 > nid=0x1ecab in Object.wait() [0x7ff17da11000] > {code} > In order to confirm, we enabled
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851904#comment-15851904 ] Jun Rao commented on KAFKA-4614: [~thetaphi], thanks for the info. Much appreciated! > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > Labels: latency, performance > Fix For: 0.10.2.0 > > Attachments: kafka-produce-99th.png > > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs] > 0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs] > 0x811affbc : clear_inode+0xac/0x140 [kernel] > 0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel] > 0x811b0815 : generic_drop_inode+0x65/0x80 [kernel] > 0x811af662 : iput+0x62/0x70 [kernel] > 0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so] > 0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 > [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so] > 0x7ff269a1307e > {code} > We took a jstack dump of the broker process and found that tid = 126123 > corresponds to the thread which is for GC(hex(tid) == nid(Native Thread ID)): > {code} > $ grep 0x1ecab /tmp/jstack.dump > "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7ff278d0c800 > nid=0x1ecab in Object.wait() [0x7ff17da11000] > {code} > In order to confirm, we enabled {{PrintGCApplicationStoppedTime}} switch and > confirmed that in total the time the broker paused is longer than usual,
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851847#comment-15851847 ] Uwe Schindler commented on KAFKA-4614: -- Just to inform you. Cleaner is no longer available in Java 9. So you need another hack. In cooperation with Apache Lucene, Oracle added a workaround that works differently for Java 9 as a temporary workaround until unmapping ist supported in Java 10. I am traveling to FOSDEM at the moment where I will talk about this an meet Mark Reinhold to discuss about the Java 10 solution. The Lucene issue with a nonreflective MethodHandle approach for unmapping in Java 7, 8 and 9 can be found here: LUCENE-6989 > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > Labels: latency, performance > Fix For: 0.10.2.0 > > Attachments: kafka-produce-99th.png > > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs] > 0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs] > 0x811affbc : clear_inode+0xac/0x140 [kernel] > 0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel] > 0x811b0815 : generic_drop_inode+0x65/0x80 [kernel] > 0x811af662 : iput+0x62/0x70 [kernel] > 0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so] > 0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 > [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so] > 0x7ff269a1307e > {code} > We took a
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851097#comment-15851097 ] Guozhang Wang commented on KAFKA-4614: -- [~kawamuray] Thanks for sharing!! > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > Labels: latency, performance > Fix For: 0.10.2.0 > > Attachments: kafka-produce-99th.png > > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs] > 0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs] > 0x811affbc : clear_inode+0xac/0x140 [kernel] > 0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel] > 0x811b0815 : generic_drop_inode+0x65/0x80 [kernel] > 0x811af662 : iput+0x62/0x70 [kernel] > 0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so] > 0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 > [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so] > 0x7ff269a1307e > {code} > We took a jstack dump of the broker process and found that tid = 126123 > corresponds to the thread which is for GC(hex(tid) == nid(Native Thread ID)): > {code} > $ grep 0x1ecab /tmp/jstack.dump > "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7ff278d0c800 > nid=0x1ecab in Object.wait() [0x7ff17da11000] > {code} > In order to confirm, we enabled {{PrintGCApplicationStoppedTime}} switch and > confirmed that in total the time the broker paused is longer than usual, when
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830330#comment-15830330 ] ASF GitHub Bot commented on KAFKA-4614: --- Github user asfgit closed the pull request at: https://github.com/apache/kafka/pull/2352 > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > Labels: latency, performance > Fix For: 0.10.2.0 > > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs] > 0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs] > 0x811affbc : clear_inode+0xac/0x140 [kernel] > 0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel] > 0x811b0815 : generic_drop_inode+0x65/0x80 [kernel] > 0x811af662 : iput+0x62/0x70 [kernel] > 0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so] > 0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 > [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so] > 0x7ff269a1307e > {code} > We took a jstack dump of the broker process and found that tid = 126123 > corresponds to the thread which is for GC(hex(tid) == nid(Native Thread ID)): > {code} > $ grep 0x1ecab /tmp/jstack.dump > "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7ff278d0c800 > nid=0x1ecab in Object.wait() [0x7ff17da11000] > {code} > In order to confirm, we enabled {{PrintGCApplicationStoppedTime}} switch and > confirmed that in total the time the broker paused is longer
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823762#comment-15823762 ] Ismael Juma commented on KAFKA-4614: [~ewencp], sure. I assigned [~junrao] as the reviewer since the fix itself is trivial, but it would be good for him to take a look. I'll do a pass through all unfixed Core JIRAs still assigned to 0.10.2.0 towards the end of the week to ensure that we either make progress or drop them from the release. > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > Labels: latency, performance > Fix For: 0.10.2.0 > > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs] > 0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs] > 0x811affbc : clear_inode+0xac/0x140 [kernel] > 0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel] > 0x811b0815 : generic_drop_inode+0x65/0x80 [kernel] > 0x811af662 : iput+0x62/0x70 [kernel] > 0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so] > 0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 > [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so] > 0x7ff269a1307e > {code} > We took a jstack dump of the broker process and found that tid = 126123 > corresponds to the thread which is for GC(hex(tid) == nid(Native Thread ID)): > {code} > $ grep 0x1ecab /tmp/jstack.dump > "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7ff278d0c800 >
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823442#comment-15823442 ] Ewen Cheslack-Postava commented on KAFKA-4614: -- [~ijuma] Seems like you started tracking work on this -- can you assign a reviewer (whether yourself or someone else, even non-committer) that can help shepherd this for 0.10.2.0 since it sounds like folks are looking to land a fix before the code freeze? > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > Labels: latency, performance > Fix For: 0.10.2.0 > > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs] > 0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs] > 0x811affbc : clear_inode+0xac/0x140 [kernel] > 0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel] > 0x811b0815 : generic_drop_inode+0x65/0x80 [kernel] > 0x811af662 : iput+0x62/0x70 [kernel] > 0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so] > 0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 > [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so] > 0x7ff269a1307e > {code} > We took a jstack dump of the broker process and found that tid = 126123 > corresponds to the thread which is for GC(hex(tid) == nid(Native Thread ID)): > {code} > $ grep 0x1ecab /tmp/jstack.dump > "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7ff278d0c800 > nid=0x1ecab in
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822254#comment-15822254 ] Guozhang Wang commented on KAFKA-4614: -- Thanks for this report and thorough analysis [~kawamuray]! > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > Labels: latency, performance > Fix For: 0.10.2.0 > > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs] > 0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs] > 0x811affbc : clear_inode+0xac/0x140 [kernel] > 0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel] > 0x811b0815 : generic_drop_inode+0x65/0x80 [kernel] > 0x811af662 : iput+0x62/0x70 [kernel] > 0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so] > 0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 > [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so] > 0x7ff269a1307e > {code} > We took a jstack dump of the broker process and found that tid = 126123 > corresponds to the thread which is for GC(hex(tid) == nid(Native Thread ID)): > {code} > $ grep 0x1ecab /tmp/jstack.dump > "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7ff278d0c800 > nid=0x1ecab in Object.wait() [0x7ff17da11000] > {code} > In order to confirm, we enabled {{PrintGCApplicationStoppedTime}} switch and > confirmed that in total the time the broker paused is longer than usual, when > a broker
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820948#comment-15820948 ] Yuto Kawamura commented on KAFKA-4614: -- I applied the patch to one of our production brokers, deployed and observe over 24 hours. As the result, this issue disappeared completely. As I wrote in description, it was happening at least 3 ~ 4 times a day so I think we can say the patch solved the problem. [~ijuma] Filled up a PR. Please review if possible. [~huxi_2b] Thanks for your comments. > Kafka already offers a method named 'forceUnmap' that actually does this kind > of thing this is exactly what I did ;) > If others use MappedByteBuffer later, it will crash. Right. I think it is somewhat a dangerous solution, but still think its safe because access to mmap object is limited to {{AbstractIndex}} and it's subclasses. Indeed we need to be careful for changing things around there though. > Portability risk, although sun.misc.Cleaner is present in most of major JVM > vendors. Right, but this class is already imported to support windows so I don't think this is a problem. > System.gc() I think this doesn't work at all. AFAIK: - System.gc() does not guarantee(JVM does "best effort") to collect all garbage which are available to be collected. - No matter if GC triggered manually or not, it blocks application threads by the same reason. [~miguno] Thanks. The investigation was hard but was a great opportunity to learn many new things ;) > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > Labels: latency, performance > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs]
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820924#comment-15820924 ] ASF GitHub Bot commented on KAFKA-4614: --- GitHub user kawamuray opened a pull request: https://github.com/apache/kafka/pull/2352 KAFKA-4614 Forcefully unmap mmap of OffsetIndex to prevent long GC pause Issue: https://issues.apache.org/jira/browse/KAFKA-4614 Fixes the problem that the broker threads suffered by long GC pause. When GC thread collects mmap objects which were created for index files, it unmaps memory mapping so kernel turns to delete a file physically. This work may transparently read file's metadata from physical disk if it's not available on cache. This seems to happen typically when we're using G1GC, due to it's strategy to left a garbage for a long time if other objects in the same region are still alive. See the link for the details. You can merge this pull request into a Git repository by running: $ git pull https://github.com/kawamuray/kafka KAFKA-4614-force-munmap-for-index Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2352.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2352 commit 8c3a4c9f5a6188c641a45cb3e111094f545f7e66 Author: Yuto KawamuraDate: 2017-01-11T08:10:05Z KAFKA-4614 Forcefully unmap mmap of OffsetIndex to prevent long GC pause > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > Labels: latency, performance > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf :
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15820875#comment-15820875 ] Michael Noll commented on KAFKA-4614: - [~kawamuray] Sorry to hear that you ran into this. Thanks a lot for the thorough investigation! > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > Labels: latency, performance > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs] > 0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs] > 0x811affbc : clear_inode+0xac/0x140 [kernel] > 0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel] > 0x811b0815 : generic_drop_inode+0x65/0x80 [kernel] > 0x811af662 : iput+0x62/0x70 [kernel] > 0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so] > 0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 > [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so] > 0x7ff269a1307e > {code} > We took a jstack dump of the broker process and found that tid = 126123 > corresponds to the thread which is for GC(hex(tid) == nid(Native Thread ID)): > {code} > $ grep 0x1ecab /tmp/jstack.dump > "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7ff278d0c800 > nid=0x1ecab in Object.wait() [0x7ff17da11000] > {code} > In order to confirm, we enabled {{PrintGCApplicationStoppedTime}} switch and > confirmed that in total the time the broker paused is longer than usual, when > a broker
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817849#comment-15817849 ] huxi commented on KAFKA-4614: - Great catch for the whole things. As for the unmap things, seems there is no perfect solution at this moment. Java does not offer a corresponding unmap that invokes underlying munmap system call for the mapped file. See [the jdk bug|http://bugs.java.com/view_bug.do?bug_id=4724038] Right now there exists two "not-that-perfect" solutions: 1. Explicitly unmap the mapped file as [~kawamuray] said via invoking 'DirectBuffer.cleaner().clean()'. By the way, Kafka already offers a method named 'forceUnmap' that actually does this kind of thing. We could simply reuse this method. However, this solution has two drawbacks: 1. If others use MappedByteBuffer later, it will crash. 2. Portability risk, although sun.misc.Cleaner is present in most of major JVM vendors. 2 Explicitly invoke 'System.gc()', which is also a good enough idea, especially when setting -XX:+DisableExplicitGC. > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs] > 0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs] > 0x811affbc : clear_inode+0xac/0x140 [kernel] > 0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel] > 0x811b0815 : generic_drop_inode+0x65/0x80 [kernel] > 0x811af662 :
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817826#comment-15817826 ] Ismael Juma commented on KAFKA-4614: [~kawamuray], thanks for reporting this. Looking forward to your results. > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs] > 0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs] > 0x811affbc : clear_inode+0xac/0x140 [kernel] > 0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel] > 0x811b0815 : generic_drop_inode+0x65/0x80 [kernel] > 0x811af662 : iput+0x62/0x70 [kernel] > 0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so] > 0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 > [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so] > 0x7ff269a1307e > {code} > We took a jstack dump of the broker process and found that tid = 126123 > corresponds to the thread which is for GC(hex(tid) == nid(Native Thread ID)): > {code} > $ grep 0x1ecab /tmp/jstack.dump > "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7ff278d0c800 > nid=0x1ecab in Object.wait() [0x7ff17da11000] > {code} > In order to confirm, we enabled {{PrintGCApplicationStoppedTime}} switch and > confirmed that in total the time the broker paused is longer than usual, when > a broker experiencing the issue. > {code} > $ grep --text 'Total time for which
[jira] [Commented] (KAFKA-4614) Long GC pause harming broker performance which is caused by mmap objects created for OffsetIndex
[ https://issues.apache.org/jira/browse/KAFKA-4614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15817627#comment-15817627 ] Yuto Kawamura commented on KAFKA-4614: -- I've tried to fix this and now I'm testing patched broker in our environment. Will submit a PR once it confirmed to fix this problem. > Long GC pause harming broker performance which is caused by mmap objects > created for OffsetIndex > > > Key: KAFKA-4614 > URL: https://issues.apache.org/jira/browse/KAFKA-4614 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.0.1 >Reporter: Yuto Kawamura >Assignee: Yuto Kawamura > > First let me clarify our system environment information as I think it's > important to understand this issue: > OS: CentOS6 > Kernel version: 2.6.32-XX > Filesystem used for data volume: XFS > Java version: 1.8.0_66 > GC option: Kafka default(G1GC) > Kafka version: 0.10.0.1 > h2. Phenomenon > In our Kafka cluster, an usual response time for Produce request is about 1ms > for 50th percentile to 10ms for 99th percentile. All topics are configured to > have 3 replicas and all producers are configured {{acks=all}} so this time > includes replication latency. > Sometimes we observe 99th percentile latency get increased to 100ms ~ 500ms > but for the most cases the time consuming part is "Remote" which means it is > caused by slow replication which is known to happen by various reasons(which > is also an issue that we're trying to improve, but out of interest within > this issue). > However, we found that there are some different patterns which happens rarely > but stationary 3 ~ 5 times a day for each servers. The interesting part is > that "RequestQueue" also got increased as well as "Total" and "Remote". > At the same time, we observed that the disk read metrics(in terms of both > read bytes and read time) spikes exactly for the same moment. Currently we > have only caught up consumers so this metric sticks to zero while all > requests are served by page cache. > In order to investigate what Kafka is "read"ing, I employed SystemTap and > wrote the following script. It traces all disk reads(only actual read by > physical device) made for the data volume by broker process. > {code} > global target_pid = KAFKA_PID > global target_dev = DATA_VOLUME > probe ioblock.request { > if (rw == BIO_READ && pid() == target_pid && devname == target_dev) { > t_ms = gettimeofday_ms() + 9 * 3600 * 1000 // timezone adjustment > printf("%s,%03d: tid = %d, device = %s, inode = %d, size = %d\n", > ctime(t_ms / 1000), t_ms % 1000, tid(), devname, ino, size) > print_backtrace() > print_ubacktrace() > } > } > {code} > As the result, we could observe many logs like below: > {code} > Thu Dec 22 17:21:39 2016,209: tid = 126123, device = sdb1, inode = -1, size > = 4096 > 0x81275050 : generic_make_request+0x0/0x5a0 [kernel] > 0x81275660 : submit_bio+0x70/0x120 [kernel] > 0xa036bcaa : _xfs_buf_ioapply+0x16a/0x200 [xfs] > 0xa036d95f : xfs_buf_iorequest+0x4f/0xe0 [xfs] > 0xa036db46 : _xfs_buf_read+0x36/0x60 [xfs] > 0xa036dc1b : xfs_buf_read+0xab/0x100 [xfs] > 0xa0363477 : xfs_trans_read_buf+0x1f7/0x410 [xfs] > 0xa033014e : xfs_btree_read_buf_block+0x5e/0xd0 [xfs] > 0xa0330854 : xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 0xa0330edf : xfs_btree_lookup+0xbf/0x470 [xfs] > 0xa032456f : xfs_bmbt_lookup_eq+0x1f/0x30 [xfs] > 0xa032628b : xfs_bmap_del_extent+0x12b/0xac0 [xfs] > 0xa0326f34 : xfs_bunmapi+0x314/0x850 [xfs] > 0xa034ad79 : xfs_itruncate_extents+0xe9/0x280 [xfs] > 0xa0366de5 : xfs_inactive+0x2f5/0x450 [xfs] > 0xa0374620 : xfs_fs_clear_inode+0xa0/0xd0 [xfs] > 0x811affbc : clear_inode+0xac/0x140 [kernel] > 0x811b0776 : generic_delete_inode+0x196/0x1d0 [kernel] > 0x811b0815 : generic_drop_inode+0x65/0x80 [kernel] > 0x811af662 : iput+0x62/0x70 [kernel] > 0x37ff2e5347 : munmap+0x7/0x30 [/lib64/libc-2.12.so] > 0x7ff169ba5d47 : Java_sun_nio_ch_FileChannelImpl_unmap0+0x17/0x50 > [/usr/jdk1.8.0_66/jre/lib/amd64/libnio.so] > 0x7ff269a1307e > {code} > We took a jstack dump of the broker process and found that tid = 126123 > corresponds to the thread which is for GC(hex(tid) == nid(Native Thread ID)): > {code} > $ grep 0x1ecab /tmp/jstack.dump > "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x7ff278d0c800 > nid=0x1ecab in Object.wait() [0x7ff17da11000] > {code} > In order to confirm, we enabled {{PrintGCApplicationStoppedTime}} switch and > confirmed that in total the time the broker paused is longer than usual, when > a broker experiencing