[jira] [Comment Edited] (LOG4J2-1305) Binary Layout

Gary Gregory (JIRA) Wed, 02 Mar 2016 23:12:31 -0800

    [ 
https://issues.apache.org/jira/browse/LOG4J2-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177388#comment-15177388
 ]


Gary Gregory edited comment on LOG4J2-1305 at 3/3/16 7:11 AM:
--------------------------------------------------------------

I just added the thread priority (int) to the LogEvent, we should have it here 
as well.


was (Author: garydgregory):
I just added the thread priority to the LogEvent, we should have it here as 
well.

> Binary Layout
> -------------
>
>                 Key: LOG4J2-1305
>                 URL: https://issues.apache.org/jira/browse/LOG4J2-1305
>             Project: Log4j 2
>          Issue Type: New Feature
>          Components: Layouts
>            Reporter: Remko Popma
>              Labels: binary
>
> Logging in a binary format instead of in text can give large performance 
> improvements. 
> Logging text means going from a LogEvent object to formatted text, and then 
> converting this text to bytes. Performance investigations with text-based 
> logging formats like PatternLayout (see LOG4J2-930), and encoding Strings to 
> bytes (LOG4J2-935, LOG4J2-1151) suggest that formatting and encoding text is 
> expensive and imposes limits on the performance that can be achieved. 
> A different approach would be to convert the LogEvent to a binary 
> representation directly without creating a text representation first. This 
> would result in extremely compact log files that are fast to write. The 
> trade-off is that a binary log cannot easily be read in a general-purpose 
> editor like VI or Notepad. A specialized tool would be necessary to either 
> display or convert to human-readable form. 
> This ticket proposes a simple BinaryLayout, where each LogEvent is logged in 
> a binary format.
> *Example BinaryLayout format*
> ||Offset||Type||Description||
> |0|long|TimeMillis|
> |8|long|NanoTime|
> |16|int|Level|
> |20|int|Logger name index - string value in separate file|
> |24|int|Thread name index - string value in separate file|
> |28|long|Thread ID|
> |36|int|Marker index - value & hierarchy in separate file|
> |40|int|Message length|
> |44|int|Message type: 0=text, 1+=custom message type|
> |48|byte[]|Message data - below offset assumes 12 bytes of message data|
> |60|int| Throwable data length|
> |64|byte[]|Throwable data - below offset assumes 16 bytes of Throwable data|
> |80|int|ThreadContext key/value pair count|
> |84|int|ThreadContext key index - string value in separate file|
> |88|int|ThreadContext value index - string value in separate file|
> *Versioning*
> The binary file must start with a header, indicating version information and 
> perhaps schema information providing meta data on the log record. Schema 
> information may make it possible to include/exclude fields. For version 1.0, 
> the schema can either be fixed like the above example, or it could be a 
> simple bitmask for the fields mentioned above.
> *Custom Messages*
> Note: custom Messages that implement the {{Encoder}} interface (introduced 
> with LOG4J2-1274) can be written in binary form directly without first being 
> converted to text (LOG4J2-506). Any specialized tool for reading binary log 
> files should handle messages of type "text" out of the box, but could have 
> some plugin mechanism for decoding custom messages.
> *Byte Order*
> TBD: Are multi-byte values like ints and longs written in big Endian or 
> little Endian? This could be specified in the header, or we could fix it to 
> either one. Exchange protocols like ITCH tend to select a fixed byte order 
> (ITCH uses big Endian - network byte order). I like the simplicity of this 
> approach.
> *Multiple Files*
> Repeating String data like thread names, logger names, marker names and 
> ThreadContextMap keys and values are saved to a separate string-data file. 
> The main log file contains an index (the line number, zero-based) into the 
> string-data file instead of the full string. Index -1 means the String value 
> was {{null}}. The format of the string-data file can simply be: each unique 
> string on a separate line (separated by '\n' (0x0A) character). Any '\n' 
> characters embedded in the string value are Unicode escaped and writen as 
> "\u000A".
> TBD: as Matt points out in the comment, Markers are special since they are 
> hierarchic. One way to deal with this is to manage a separate file to save 
> the Marker hierarchy. Another way is to do something similar to 
> PatternLayout: treat it as a String value, where the string includes 
> hierarchy information. I like the simplicity of the latter approach.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LOG4J2-1305) Binary Layout

Reply via email to