dockerzhang opened a new issue #709:
URL: https://github.com/apache/incubator-inlong/issues/709


   <p>I think the current Broker's read and write performance still has a 
relatively large room for improvement. We need to continue to iterate to 
improve the storage performance of the system. I have listed some 
considerations and hope to get some better suggestions:</p>
   
   <p>1. Data read and write operations should consider the characteristics of 
the disk, for example, the disk is based on 512-byte sectors as its storage 
unit, and read data in batches of 64k; the file system will eliminate the cache 
according to certain rules Pages in memory etc. If the read and write 
operations take these contents into account, I believe that the current TPS can 
be higher;</p>
   
   <p>2. Storage should consider the problem of fragmentation of disk space, 
such as pre-allocation of fixed-length files and reuse of aging files to enable 
continuous reading of disk files and improve data read and write speed;</p>
   
   <p>3. The number of memory cache blocks should be configurable: the current 
memory cache is managed according to the fixed configuration of 2 memory blocks 
per topic. We should allow the business to build more memory cache space based 
on actual resource conditions;</p>
   
   <p>4. More effective memory-to-disk operation: At present, the flashing 
operation is to flash messages from the memory to the disk one by one for 
storage. This block can be adjusted to write to the disk in batches according 
to the memory block, thereby improving storage efficiency;</p>
   
   <p>5. Remove the SSD auxiliary consumption function: Because the SSD disk 
capacity is too small, the SSD storage consumption is not suitable for 
practical applications, so it should be removed to avoid user confusion, and 
related configurations and settings need to be cleaned up;</p>
   
   <p>6. The stored file should increase the content of the file header, 
including the data version information, in order to facilitate the subsequent 
storage scheme is still seamlessly compatible with the data format of the old 
version;</p>
   
   <p>7. Add CheckPoint check mechanism: the current system will only check the 
validity of the last file when it is restarted. In fact, when the system is 
shut down, there may be multiple consecutive files still in memory, the 
practice that only the last file is checked currently is easy to cause abnormal 
mixing into the data stream, we should add CheckPoint mechanism to improve this 
abnormal situation.</p>
   <i>JIRA link - <a 
href="https://issues.apache.org/jira/browse/INLONG-110";>[INLONG-110]</a> 
created by gosonzhang</i>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to