[jira] [Resolved] (FLINK-26718) Limitations of flink+hive dimension table

luoyuxia (Jira) Wed, 29 Jun 2022 21:11:12 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-26718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


luoyuxia resolved FLINK-26718.
------------------------------
    Resolution: Not A Problem

[~kunghsu] I think we can close this issue. Feel free to open it again if you 
still have question about this issue.

> Limitations of flink+hive dimension table
> -----------------------------------------
>
>                 Key: FLINK-26718
>                 URL: https://issues.apache.org/jira/browse/FLINK-26718
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive
>    Affects Versions: 1.12.7
>            Reporter: kunghsu
>            Priority: Major
>              Labels: HIVE
>
> Limitations of flink+hive dimension table
> The scenario I am involved in is a join relationship between the Kafka input 
> table and the Hive dimension table. The hive dimension table is some user 
> data, and the data is very large.
> When the data volume of the hive table is small, about a few hundred rows, 
> everything is normal, the partition is automatically recognized and the 
> entire task is executed normally.
> When the hive table reached about 1.3 million, the TaskManager began to fail 
> to work properly. It was very difficult to even look at the log. I guess it 
> burst the JVM memory when it tried to load the entire table into memory. You 
> can see that a heartbeat timeout exception occurs in Taskmanager, such as 
> Heartbeat TimeoutException.I even increased the parallelism to no avail.
> Official website documentation: 
> [https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/table/connectors/hive/hive_read_write.html#source-parallelism-inference]
> So I have a question, does flink+hive not support association of large tables 
> so far?
> Is this solution unusable when the amount of data is too large?
>  
>  
>  
> Simply estimate, how much memory will 25 million data take up?
> Suppose a line of data is 1K, 25 million K is 25000M, or 25G.
> If the memory of the TM is set to 32G, can the problem be solved?
> It doesn't seem to work either, because this can only be allocated roughly 
> 16G to the jvm.
> Assuming that the official solution can support such a large amount, how 
> should the memory of the TM be set?
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (FLINK-26718) Limitations of flink+hive dimension table

Reply via email to