[jira] [Updated] (IGNITE-22757) Excessive memory usage in schema-related code in SQL

Roman Puchkovskiy (Jira) Wed, 17 Jul 2024 03:25:03 -0700


     [ 
https://issues.apache.org/jira/browse/IGNITE-22757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Roman Puchkovskiy updated IGNITE-22757:
---------------------------------------
    Description: 
I have the following test:

 
{noformat}
class It1000TablesTest extends ClusterPerTestIntegrationTest {
    private static final String DEFAULT_STORAGE_ENGINE = "<default>";

    private final String storageEngine = "aipersist";

    private final int columnsCount = 200;

    @Override
    protected int initialNodes() {
        return 1;
    }

    @Test
    void test() {
        String storageProfile =
                DEFAULT_STORAGE_ENGINE.equals(storageEngine) ? 
DEFAULT_STORAGE_PROFILE : "default_" + storageEngine.toLowerCase();

        String zoneSql = "create zone test_zone with replicas=1, 
storage_profiles='" + storageProfile + "';";

        cluster.doInSession(0, session -> {
            executeUpdate(zoneSql, session);
        });

        for (int i = 0; i < 1000; i++) {
            String tableName = String.format("table%03d", i);

            String valColumns = columnNames()
                    .map(colName -> colName + " varchar(40)")
                    .collect(joining(", "));

            String tableSql = "create table " + tableName + " (key int primary 
key, " + valColumns + ")"
                    + " with primary_zone='TEST_ZONE', storage_profile='" + 
storageProfile + "';";

            String columnNames = columnNames().collect(joining(", "));
            String values = IntStream.range(0, columnsCount)
                    .mapToObj(n -> UUID.randomUUID().toString())
                    .map(s -> "'" + s + "'")
                    .collect(joining(", "));

            String insertSql = "insert into " + tableName + " (key, " + 
columnNames + ") values (" + i + ", " + values + ")";

            cluster.doInSession(0, session -> {
                executeUpdate(tableSql, session);

                executeUpdate(insertSql, session);
            });

            int createdTables = i + 1;

            if (createdTables % 1 == 0) {
                log.info("XXX Created " + createdTables + " tables");
            }
        }
    }

    private Stream<String> columnNames() {
        return IntStream.range(0, columnsCount)
                .mapToObj(n -> String.format("val%03d", n));
    }
}
{noformat}
It just tries to create a 1000 of tables, 201 column each (sharing the same 
zone), making an insert to each of them after creating it.

 

After creating about 200 tables I took a heap dump, here are the top consumers 
of the heap:

!image-2024-07-17-14-13-29-683.png!

There are just around tables, but 20k IgniteTableImpl instances and more than 
4M CatalogColumnDescriptor instances. It feels like an arithmetic progression: 
sum of 1..200 gives (1+200)*100≈20000, as if addition of a new table made a 
copy of all existing tables as well.

SqlSchemaManager caches tables by <tableId, catalogVersion> pair, so, if a 
catalog is modified N times in a way that does not concern a table, N copies of 
the table might be created in the cache (and they do get created). It seems 
natural to cache tables (maybe additionally to the existing caching) by 
<tableId, tableVersion>.

Another problem is that, as the cache is bounded, it starts forgetting older 
instances, so they get recreated; but those older instances are actually used 
by some internal structures of Calcite, so, even if tables are properly cached 
by <tableId, tableVersion>, duplicates will emerge when anough tables are 
created.

> Excessive memory usage in schema-related code in SQL
> ----------------------------------------------------
>
>                 Key: IGNITE-22757
>                 URL: https://issues.apache.org/jira/browse/IGNITE-22757
>             Project: Ignite
>          Issue Type: Bug
>          Components: sql
>            Reporter: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>         Attachments: image-2024-07-17-14-13-29-683.png
>
>
> I have the following test:
>  
> {noformat}
> class It1000TablesTest extends ClusterPerTestIntegrationTest {
>     private static final String DEFAULT_STORAGE_ENGINE = "<default>";
>     private final String storageEngine = "aipersist";
>     private final int columnsCount = 200;
>     @Override
>     protected int initialNodes() {
>         return 1;
>     }
>     @Test
>     void test() {
>         String storageProfile =
>                 DEFAULT_STORAGE_ENGINE.equals(storageEngine) ? 
> DEFAULT_STORAGE_PROFILE : "default_" + storageEngine.toLowerCase();
>         String zoneSql = "create zone test_zone with replicas=1, 
> storage_profiles='" + storageProfile + "';";
>         cluster.doInSession(0, session -> {
>             executeUpdate(zoneSql, session);
>         });
>         for (int i = 0; i < 1000; i++) {
>             String tableName = String.format("table%03d", i);
>             String valColumns = columnNames()
>                     .map(colName -> colName + " varchar(40)")
>                     .collect(joining(", "));
>             String tableSql = "create table " + tableName + " (key int 
> primary key, " + valColumns + ")"
>                     + " with primary_zone='TEST_ZONE', storage_profile='" + 
> storageProfile + "';";
>             String columnNames = columnNames().collect(joining(", "));
>             String values = IntStream.range(0, columnsCount)
>                     .mapToObj(n -> UUID.randomUUID().toString())
>                     .map(s -> "'" + s + "'")
>                     .collect(joining(", "));
>             String insertSql = "insert into " + tableName + " (key, " + 
> columnNames + ") values (" + i + ", " + values + ")";
>             cluster.doInSession(0, session -> {
>                 executeUpdate(tableSql, session);
>                 executeUpdate(insertSql, session);
>             });
>             int createdTables = i + 1;
>             if (createdTables % 1 == 0) {
>                 log.info("XXX Created " + createdTables + " tables");
>             }
>         }
>     }
>     private Stream<String> columnNames() {
>         return IntStream.range(0, columnsCount)
>                 .mapToObj(n -> String.format("val%03d", n));
>     }
> }
> {noformat}
> It just tries to create a 1000 of tables, 201 column each (sharing the same 
> zone), making an insert to each of them after creating it.
>  
> After creating about 200 tables I took a heap dump, here are the top 
> consumers of the heap:
> !image-2024-07-17-14-13-29-683.png!
> There are just around tables, but 20k IgniteTableImpl instances and more than 
> 4M CatalogColumnDescriptor instances. It feels like an arithmetic 
> progression: sum of 1..200 gives (1+200)*100≈20000, as if addition of a new 
> table made a copy of all existing tables as well.
> SqlSchemaManager caches tables by <tableId, catalogVersion> pair, so, if a 
> catalog is modified N times in a way that does not concern a table, N copies 
> of the table might be created in the cache (and they do get created). It 
> seems natural to cache tables (maybe additionally to the existing caching) by 
> <tableId, tableVersion>.
> Another problem is that, as the cache is bounded, it starts forgetting older 
> instances, so they get recreated; but those older instances are actually used 
> by some internal structures of Calcite, so, even if tables are properly 
> cached by <tableId, tableVersion>, duplicates will emerge when anough tables 
> are created.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IGNITE-22757) Excessive memory usage in schema-related code in SQL

Reply via email to