anlowee opened a new issue, #10702:
URL: https://github.com/apache/ignite/issues/10702

   Greetings! I am a master's student conducting research on performance 
diagnosis. My current focus is on studying the evolution of distributed systems 
and the occurrence of performance regression during their development. As part 
of my research, I am examining Apache Ignite as a case study.
   
   To conduct my study, I used the YCSB benchmark 
(https://github.com/brianfrankcooper/YCSB/tree/master/ignite), configuring it 
to perform 1,000,000 update operations and initialize with 100,000 records. I 
conducted the testing with a single thread and 3 nodes on a single machine 
using 3 different ports. Upon analyzing the results, I observed that the 
average update latency increased by approximately 10% from version 2.7.6 to 
version 2.14.0 (compiled by jdk8 locally). I am currently attempting to 
understand the cause behind this increase. For your reference, I have included 
the detailed configuration of one of the nodes below.
   ```
   <?xml version="1.0" encoding="UTF-8"?>
     
       <!--
         Copyright (c) 2018 YCSB contributors. All rights reserved.
   
         Licensed to the Apache Software Foundation (ASF) under one or more
         contributor license agreements.  See the NOTICE file distributed with
         this work for additional information regarding copyright ownership.
         The ASF licenses this file to You under the Apache License, Version 2.0
         (the "License"); you may not use this file except in compliance with
         the License.  You may obtain a copy of the License at
              http://www.apache.org/licenses/LICENSE-2.0
         Unless required by applicable law or agreed to in writing, software
         distributed under the License is distributed on an "AS IS" BASIS,
         WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
         See the License for the specific language governing permissions and
         limitations under the License.
       -->
   
       <!--
           Ignite Spring configuration file to startup Ignite cache.
           This file demonstrates how to configure cache using Spring. Provided 
cache
           will be created on node startup.
           Use this configuration file when running HTTP REST examples (see 
'examples/rest' folder).
           When starting a standalone node, you need to execute the following 
command:
           {IGNITE_HOME}/bin/ignite.{bat|sh} examples/config/example-cache.xml
           When starting Ignite from Java IDE, pass path to this file to 
Ignition:
           Ignition.start("examples/config/example-cache.xml");
       -->
   
   <beans xmlns="http://www.springframework.org/schema/beans";
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
          xsi:schemaLocation="
           http://www.springframework.org/schema/beans
           http://www.springframework.org/schema/beans/spring-beans.xsd";>
   <bean id="ignite.cfg" 
class="org.apache.ignite.configuration.IgniteConfiguration">
   
           <property name="dataStorageConfiguration">
               <bean 
class="org.apache.ignite.configuration.DataStorageConfiguration">
                   <property name="walMode" value="LOG_ONLY"/>
                   <property name="storagePath" 
value="/data/dbignite2.10.0-SNAPSHOT"/>
                   <property name="walPath" 
value="/data/walignite2.10.0-SNAPSHOT"/>
                   <property name="walArchivePath" 
value="/data/walarchignite2.10.0-SNAPSHOT"/>
                   <property name="walHistorySize" value="1"/>
                   <property name="metricsEnabled" value="true"/>
   
                   <property name="defaultDataRegionConfiguration">
                       <bean 
class="org.apache.ignite.configuration.DataRegionConfiguration">
                           <property name="name" value="default_data_region"/>
                           <property name="persistenceEnabled" value="false"/>
                           <!-- Setting the max size of the default region to 
10GB. -->
                           <property name="maxSize" value="#{10L * 1024 * 1024 
* 1024}"/>
                           <!-- Setting the initial size of the default region 
to 10GB. -->
                           <property name="initialSize" value="#{10L * 1024 * 
1024 * 1024}"/>
                           <property name="checkpointPageBufferSize" 
value="#{1L * 1024 * 1024 * 1024}"/>
                           <property name="metricsEnabled" value="true"/>
                       </bean>
                   </property>
               </bean>
           </property>
   
   
           <property name="cacheConfiguration">
               <list>
   <bean class="org.apache.ignite.configuration.CacheConfiguration">
                       <property name="name" value="usertable"/>
                       <property name="atomicityMode" value="ATOMIC"/>
                       <property name="cacheMode" value="PARTITIONED"/>
                       <property name="backups" value="1"/>
                       <property name="writeSynchronizationMode" 
value="FULL_SYNC"/>
                   </bean>
               </list>
           </property>
   
   
       <!-- Explicitly configure TCP discovery SPI to provide list of initial 
nodes. -->
       <property name="discoverySpi">
               <bean 
class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
               <property name="localPort" value="47500" />
               <property name="ipFinder">
                   <bean 
class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                       <property name="addresses">
                           <list>
                               <!--The list of hosts includes client host. -->
                               
<!--<value><hostname_or_IP>:47500..47509</value>-->
                               
<!--<value><hostname_or_IP>:47500..47509</value>-->
                               <value>10.1.0.16:47500</value>
                               <value>10.1.0.16:47501</value>
                               <value>10.1.0.16:47502</value>
                           </list>
                       </property>
                   </bean>
               </property>
           </bean>
       </property>
   </bean>
   </beans>
   ```
   
   Next, I utilized JFR to profile the runtime and discovered a significant 
increase in the profiling samples of sun.nio.ch.EPollArrayWrapper.epollWait(). 
Specifically, the overall method profiling samples increased by approximately 
2000 (equivalent to around 10% of v2.7.6's overall method profiling samples, 
which matches the increase ratio of latency), while the increase in samples of 
epollWait() was around 5500, and java.net.PlainSocketImpl.socketAccept() 
decreased by approximately 3900 samples. To gather this data, I conducted ten 
rounds of testing using the benchmark mentioned above on an Ubuntu 18.04.1 LTS 
system with x86_64 arch, 16 cores, and 132GB memory.
   
   Based on my findings, I concluded that epollWait() is the primary cause of 
the latency increase. I have since attempted to narrow down the issue and 
locate the particular commit(s) responsible. I discovered that commit 
[1094fff](https://github.com/apache/ignite/commit/1094fff4df636fec1c807a8572597c5178a34b32)
 had an increase in samples of approximately 3400 compared to its 
[parent](https://github.com/apache/ignite/commit/5e37db2f27a02b5c31c8b7a76b18f42e9d243111).
 This increase accounts for around 61.82% of the total increase in epollWait() 
samples from v2.7.6 to v2.14.0. However, the latency did not significantly 
increase from its parent, as socketAccept() also decreased by approximately 
3400 samples.
   
   At this point, I would like to understand why this particular commit caused 
such a significant increase in epollWait() samples. The commit appears to only 
disable JMX monitoring, but since I lack context, I am seeking suggestions on 
where to focus my further study. Can you please provide any advice or 
recommendations?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to