[jira] [Created] (FLINK-38644) Reading tables with String type as the primary key may cause OutOfMemory Error

Yanquan Lv (Jira) Fri, 07 Nov 2025 02:37:27 -0800

Yanquan Lv created FLINK-38644:
----------------------------------

             Summary: Reading tables with String type as the primary key may 
cause OutOfMemory Error
                 Key: FLINK-38644
                 URL: https://issues.apache.org/jira/browse/FLINK-38644
             Project: Flink
          Issue Type: Bug
            Reporter: Yanquan Lv



When using a {*}String type as the primary key{*}, {{MySqlChunkSplitter}} 
employs an {*}unevenly chunking algorithm{*}. Specifically, it queries the 
{{min}} and {{max}} values of the key range, calculates the {{ChunkEnd}} based 
on {{chunkStart}} and {{{}chunkSize{}}}, and compares {{ChunkEnd}} with {{max}} 
to determine whether to proceed with the next chunk split.

However, during the querying of {{{}min{}}}, {{{}max{}}}, and {{{}ChunkEnd{}}}, 
*MySQL's sorting rules* are applied. In contrast, when comparing {{ChunkEnd}} 
and {{max}} to decide the chunk boundary, the comparison relies on {*}Java's 
string sorting rules{*}. By default, *MySQL is case-insensitive* in string 
comparisons, while {*}Java's string sorting is case-sensitive{*}. This 
discrepancy may result in {*}unexpected outcomes{*}, which can ultimately lead 
to an {*}{{OutOfMemoryError}}{*}.



For example, in MySQL, consider a set of primary key data sorted by the 
database's collation rules as:
{{{}"a1,A2,b1,B2,c1,C2,d1,D2,e1,E2,f1,F2"{}}}.
Assume the {{chunkSize}} is 4. The computed {{min/max}} values would be {{a1}} 
and {{{}F2{}}}.
 * {*}First Chunk{*}: The calculated {{chunkEnd}} is {{{}B2{}}}.
 * {*}Second Chunk{*}: The calculated {{chunkEnd}} is {{{}d1{}}}.

However, due to Java's lexicographical string comparison (case-sensitive), 
{{d1}} is considered *greater than* {{F2}} (since {{'d' < 'F'}} in ASCII). As a 
result:
 * The second chunk's {{chunkEnd}} becomes {{{}null{}}}.
 * The final chunks are: {{[null, B2]}} and {{{}[B2, null]{}}}.

This inconsistency may lead to the second chunk being incorrectly processed by 
the {*}TaskManager{*}, potentially causing an {*}{{OutOfMemoryError}}{*}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-38644) Reading tables with String type as the primary key may cause OutOfMemory Error

Reply via email to