[ https://issues.apache.org/jira/browse/FLINK-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
luoyuxia resolved FLINK-26718. ------------------------------ Resolution: Not A Problem [~kunghsu] I think we can close this issue. Feel free to open it again if you still have question about this issue. > Limitations of flink+hive dimension table > ----------------------------------------- > > Key: FLINK-26718 > URL: https://issues.apache.org/jira/browse/FLINK-26718 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive > Affects Versions: 1.12.7 > Reporter: kunghsu > Priority: Major > Labels: HIVE > > Limitations of flink+hive dimension table > The scenario I am involved in is a join relationship between the Kafka input > table and the Hive dimension table. The hive dimension table is some user > data, and the data is very large. > When the data volume of the hive table is small, about a few hundred rows, > everything is normal, the partition is automatically recognized and the > entire task is executed normally. > When the hive table reached about 1.3 million, the TaskManager began to fail > to work properly. It was very difficult to even look at the log. I guess it > burst the JVM memory when it tried to load the entire table into memory. You > can see that a heartbeat timeout exception occurs in Taskmanager, such as > Heartbeat TimeoutException.I even increased the parallelism to no avail. > Official website documentation: > [https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/table/connectors/hive/hive_read_write.html#source-parallelism-inference] > So I have a question, does flink+hive not support association of large tables > so far? > Is this solution unusable when the amount of data is too large? > > > > Simply estimate, how much memory will 25 million data take up? > Suppose a line of data is 1K, 25 million K is 25000M, or 25G. > If the memory of the TM is set to 32G, can the problem be solved? > It doesn't seem to work either, because this can only be allocated roughly > 16G to the jvm. > Assuming that the official solution can support such a large amount, how > should the memory of the TM be set? > -- This message was sent by Atlassian Jira (v8.20.10#820010)