[ 
https://issues.apache.org/jira/browse/IGNITE-20697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Plekhanov updated IGNITE-20697:
---------------------------------------
    Description: 
Currently, physycal records take most of the WAL size. But physical records in 
WAL files required only for crash recovery and these records are useful only 
for a short period of time (since last checkpoint). 
Size of physical records during checkpoint is more than size of all modified 
pages between checkpoints, since we need to store page snapshot record for each 
modified page and page delta records, if page is modified more than once 
between checkpoints.
We process WAL file several times in stable workflow (without crashes and 
rebalances):
 # We write records to WAL files
 # We copy WAL files to archive
 # We compact WAL files (remove phisical records + compress)

So, totally we write all physical records twice and read physical records at 
least twice.

To reduce disc workload we can move physical records to another storage and 
don't write them to WAL files. To provide the same crash recovery guarantees we 
can write modified pages twice during checkpoint. First time to some delta file 
and second time to the page storage. In this case we can recover any page if we 
crash during write to page storage from delta file (instead of WAL, as we do 
now).

This proposal has pros and cons.
Pros:
 - Less size of stored data (we don't store page delta files, only final state 
of the page)
 - Reduced disc workload (we write all modified pages once instead of 2 writes 
and 2 reads of larger amount of data)
 - Potentially reduced latency (instead of writing physical records 
synchronously during data modification we write to WAL only logical records and 
physical pages will be written by checkpointer threads)

Cons:
 - Increased checkpoint duration (we should write doubled amount of data during 
checkpoint)

Let's try to implement it and benchmark.

  was:
Currentrly, physycal records take most of the WAL size. But physical records in 
WAL files required only for crash recovery and these records are useful only 
for a short period of time (since last checkpoint). 
Size of physical records during checkpoint is more than size of all modified 
pages between checkpoints, since we need to store page snapshot record for each 
modified page and page delta records, if page is modified more than once 
between checkpoints.
We process WAL file several times in stable workflow (without crashes and 
rebalances):
 # We write records to WAL files
 # We copy WAL files to archive
 # We compact WAL files (remove phisical records + compress)

So, totally we write all physical records twice and read physical records at 
least twice.

To reduce disc workload we can move physical records to another storage and 
don't write them to WAL files. To provide the same crash recovery guarantees we 
can write modified pages twice during checkpoint. First time to some delta file 
and second time to the page storage. In this case we can recover any page if we 
crash during write to page storage from delta file (instead of WAL, as we do 
now).

This proposal has pros and cons.
Pros:
 - Less size of stored data (we don't store page delta files, only final state 
of the page)
 - Reduced disc workload (we store additionally write once all modified pages 
instead of 2 writes and 2 reads of larger amount of data)
 - Potentially reduced latency (instead of writing physical records 
synchronously during data modification we write to WAL only logical records and 
physical pages will be written by checkpointer threads)

Cons:
 - Increased checkpoint duration (we should write doubled amount of data during 
checkpoint)

Let's try to implement it and benchmark.


> Move physical records from WAL to another storage 
> --------------------------------------------------
>
>                 Key: IGNITE-20697
>                 URL: https://issues.apache.org/jira/browse/IGNITE-20697
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Aleksey Plekhanov
>            Assignee: Aleksey Plekhanov
>            Priority: Major
>              Labels: iep-113, ise
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, physycal records take most of the WAL size. But physical records 
> in WAL files required only for crash recovery and these records are useful 
> only for a short period of time (since last checkpoint). 
> Size of physical records during checkpoint is more than size of all modified 
> pages between checkpoints, since we need to store page snapshot record for 
> each modified page and page delta records, if page is modified more than once 
> between checkpoints.
> We process WAL file several times in stable workflow (without crashes and 
> rebalances):
>  # We write records to WAL files
>  # We copy WAL files to archive
>  # We compact WAL files (remove phisical records + compress)
> So, totally we write all physical records twice and read physical records at 
> least twice.
> To reduce disc workload we can move physical records to another storage and 
> don't write them to WAL files. To provide the same crash recovery guarantees 
> we can write modified pages twice during checkpoint. First time to some delta 
> file and second time to the page storage. In this case we can recover any 
> page if we crash during write to page storage from delta file (instead of 
> WAL, as we do now).
> This proposal has pros and cons.
> Pros:
>  - Less size of stored data (we don't store page delta files, only final 
> state of the page)
>  - Reduced disc workload (we write all modified pages once instead of 2 
> writes and 2 reads of larger amount of data)
>  - Potentially reduced latency (instead of writing physical records 
> synchronously during data modification we write to WAL only logical records 
> and physical pages will be written by checkpointer threads)
> Cons:
>  - Increased checkpoint duration (we should write doubled amount of data 
> during checkpoint)
> Let's try to implement it and benchmark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to