gsoundar opened a new issue, #4115:
URL: https://github.com/apache/logging-log4j2/issues/4115

   ## Feature Request
   
   ### Description
   
   Add a new `log4j-iceberg` module that provides an `IcebergAppender` plugin 
for writing log events as Parquet-backed rows in an Apache Iceberg table. This 
enables structured, columnar log storage with time-travel, schema evolution, 
and partition pruning capabilities out of the box.
   
   ### Motivation
   
   Modern observability pipelines increasingly rely on data lake formats 
(Iceberg, Delta, Hudi) for log analytics due to their advantages over flat 
files:
   
   - **Columnar storage** (Parquet) enables efficient analytical queries over 
large log volumes
   - **Partition pruning** by date allows fast time-range scans without full 
table reads
   - **Schema evolution** means log schemas can be extended without rewriting 
history
   - **Time travel** enables querying historical log state at any snapshot
   - **Catalog integration** (REST, Hive, AWS Glue) provides unified metadata 
management
   
   Log4j already supports structured output to databases (JDBC, Cassandra, 
MongoDB) and message systems (Kafka, JMS). An Iceberg appender fills the gap 
for the data lake ecosystem.
   
   ### Proposed Implementation
   
   A new `log4j-iceberg` module with:
   
   - `IcebergAppender` — Log4j plugin (`<Iceberg>`) that buffers events and 
flushes them as Parquet data files
   - `IcebergManager` — Manages catalog lifecycle, table creation, buffered 
writes, and commit retry
   - Table partitioned by `event_date` (day granularity)
   - Schema validation on startup when loading existing tables
   - Configurable catalog properties for S3 credentials, REST auth, etc.
   - Exponential backoff retry on commit conflicts
   
   ### Configuration Example
   
   ```xml
   <Iceberg name="IcebergAppender"
            catalogName="my_catalog"
            catalogImpl="rest"
            catalogUri="http://localhost:8181";
            catalogWarehouse="s3://my-bucket/warehouse"
            tableNamespace="logs"
            tableName="app_logs"
            batchSize="1000"
            flushIntervalSeconds="30">
     <CatalogProperties>
       <Property name="s3.access-key-id">AKIA...</Property>
       <Property name="s3.secret-access-key">secret</Property>
     </CatalogProperties>
   </Iceberg>
   ```
   
   ### Dependencies
   
   - Apache Iceberg 1.10.1
   - Apache Parquet 1.16.0
   - Hadoop 3.4.1
   
   ### Related PR
   
   - #4104


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to