[jira] [Commented] (FLINK-29647) report stackoverflow when using kryo
[ https://issues.apache.org/jira/browse/FLINK-29647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626892#comment-17626892 ] Gao Fei commented on FLINK-29647: - Our mysql database has no primary key and cannot achieve idempotent writes using upsert statements > report stackoverflow when using kryo > > > Key: FLINK-29647 > URL: https://issues.apache.org/jira/browse/FLINK-29647 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Affects Versions: 1.13.2 > Environment: flink 1.13.2 version (kryo 2.24 version) >Reporter: Gao Fei >Priority: Major > Labels: KryoSerializer > > When using kryo to report stackoverflow, the error is as follows: > {code:java} > java.lang.StackOverflowError at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:43) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) > {code} > I am using two-phase commit to write data to mysql, the following is part of > the mysql sink code: > {code:java} > public class MySqlTwoPhaseCommitSink extends > TwoPhaseCommitSinkFunction, Connection,Void> { > private static final Logger log = > LoggerFactory.getLogger(MySqlTwoPhaseCommitSink.class); > public MySqlTwoPhaseCommitSink(){ > super(new KryoSerializer<>(Connection.class,new ExecutionConfig()), > VoidSerializer.INSTANCE); > } > @Override > public void invoke(Connection connection, Tuple2 tp, > Context context) throws Exception { > log.info("start invoke..."); > //TODO > //omit here > } > @Override > public Connection beginTransaction() throws Exception { > log.info("start beginTransaction..."); > String url = > "jdbc:mysql://localhost:3306/bigdata?useUnicode=true=UTF-8"; > Connection connection = DBConnectUtil.getConnection(url, "root", > "123456"); > return connection; > } > @Override > public void preCommit(Connection connection) throws Exception { > log.info("start preCommit..."); > } > @Override > public void commit(Connection connection) { > log.info("start commit..."); > DBConnectUtil.commit(connection); > } > @Override > public void abort(Connection connection) { > log.info("start abort rollback..."); > DBConnectUtil.rollback(connection); > } > }{code} > I also found similar problem reports: > https://github.com/EsotericSoftware/kryo/issues/341 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-29647) report stackoverflow when using kryo
[ https://issues.apache.org/jira/browse/FLINK-29647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626545#comment-17626545 ] Gao Fei commented on FLINK-29647: - Two-phase commit, use transaction method to write data to mysql, currently found no better serialization method, when the task crashes in the commit phase, the task restarts and recovers, it seems that there is no way to recover the previous transaction and re-commit [~zhuzh] [~xtsong] Is there a good solution currently? > report stackoverflow when using kryo > > > Key: FLINK-29647 > URL: https://issues.apache.org/jira/browse/FLINK-29647 > Project: Flink > Issue Type: Bug > Components: API / Type Serialization System >Affects Versions: 1.13.2 > Environment: flink 1.13.2 version (kryo 2.24 version) >Reporter: Gao Fei >Priority: Major > Labels: KryoSerializer > > When using kryo to report stackoverflow, the error is as follows: > {code:java} > java.lang.StackOverflowError at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:43) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at > com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) > {code} > I am using two-phase commit to write data to mysql, the following is part of > the mysql sink code: > {code:java} > public class MySqlTwoPhaseCommitSink extends > TwoPhaseCommitSinkFunction, Connection,Void> { > private static final Logger log = > LoggerFactory.getLogger(MySqlTwoPhaseCommitSink.class); > public MySqlTwoPhaseCommitSink(){ > super(new KryoSerializer<>(Connection.class,new ExecutionConfig()), > VoidSerializer.INSTANCE); > } > @Override > public void invoke(Connection connection, Tuple2 tp, > Context context) throws Exception { > log.info("start invoke..."); > //TODO > //omit here > } > @Override > public Connection beginTransaction() throws Exception { > log.info("start beginTransaction..."); > String url = > "jdbc:mysql://localhost:3306/bigdata?useUnicode=true=UTF-8"; > Connection connection = DBConnectUtil.getConnection(url, "root", > "123456"); > return connection; > } > @Override > public void preCommit(Connection connection) throws Exception { > log.info("start preCommit..."); > } > @Override > public void commit(Connection connection) { > log.info("start commit..."); > DBConnectUtil.commit(connection); > } > @Override > public void abort(Connection connection) { > log.info("start abort rollback..."); > DBConnectUtil.rollback(connection); > } > }{code} > I also found similar problem reports: > https://github.com/EsotericSoftware/kryo/issues/341 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29647) report stackoverflow when using kryo
Gao Fei created FLINK-29647: --- Summary: report stackoverflow when using kryo Key: FLINK-29647 URL: https://issues.apache.org/jira/browse/FLINK-29647 Project: Flink Issue Type: Bug Components: API / Type Serialization System Affects Versions: 1.13.2 Environment: flink 1.13.2 version (kryo 2.24 version) Reporter: Gao Fei When using kryo to report stackoverflow, the error is as follows: {code:java} java.lang.StackOverflowError at com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:43) at com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) at com.esotericsoftware.kryo.Generics.getConcreteClass(Generics.java:44) {code} I am using two-phase commit to write data to mysql, the following is part of the mysql sink code: {code:java} public class MySqlTwoPhaseCommitSink extends TwoPhaseCommitSinkFunction, Connection,Void> { private static final Logger log = LoggerFactory.getLogger(MySqlTwoPhaseCommitSink.class); public MySqlTwoPhaseCommitSink(){ super(new KryoSerializer<>(Connection.class,new ExecutionConfig()), VoidSerializer.INSTANCE); } @Override public void invoke(Connection connection, Tuple2 tp, Context context) throws Exception { log.info("start invoke..."); //TODO //omit here } @Override public Connection beginTransaction() throws Exception { log.info("start beginTransaction..."); String url = "jdbc:mysql://localhost:3306/bigdata?useUnicode=true=UTF-8"; Connection connection = DBConnectUtil.getConnection(url, "root", "123456"); return connection; } @Override public void preCommit(Connection connection) throws Exception { log.info("start preCommit..."); } @Override public void commit(Connection connection) { log.info("start commit..."); DBConnectUtil.commit(connection); } @Override public void abort(Connection connection) { log.info("start abort rollback..."); DBConnectUtil.rollback(connection); } }{code} I also found similar problem reports: https://github.com/EsotericSoftware/kryo/issues/341 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (FLINK-25321) standalone deploy on k8s,pod always OOM killed,actual heap memory usage is normal, gc is normal
[ https://issues.apache.org/jira/browse/FLINK-25321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17462965#comment-17462965 ] Gao Fei commented on FLINK-25321: - [~wangyang0918] OK,thanks > standalone deploy on k8s,pod always OOM killed,actual heap memory usage is > normal, gc is normal > --- > > Key: FLINK-25321 > URL: https://issues.apache.org/jira/browse/FLINK-25321 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.11.3 > Environment: Flink 1.11.3 > k8s v1.21.0 > standlone deployment >Reporter: Gao Fei >Priority: Major > > Start a cluster on k8s, deploy in standalone mode, a jobmanager pod (1G) and > a taskmanager pod (3372MB limit), the total memory configuration of the Flink > TM process is 3072MB, and the managed configuration is 0, both of which are > on the heap memory. Now the pod It will always be OOM killed, and the total > process memory will always exceed 3072MB. I saw that the system has adopted > jemlloc. There is no 64M problem. The application itself has not applied for > direct memory. It is strange why the process is always killed by OOM after a > period of time. > > INFO [] - Final TaskExecutor Memory configuration: > INFO [] - Total Process Memory: 3.000gb (3221225472 bytes) > INFO [] - Total Flink Memory: 2.450gb (2630667464 bytes) > INFO [] - Total JVM Heap Memory: 2.080gb (2233382986 bytes) > INFO [] - Framework: 128.000mb (134217728 bytes) > INFO [] - Task: 1.955gb (2099165258 bytes) > INFO [] - Total Off-heap Memory: 378.880mb (397284478 bytes) > INFO [] - Managed: 0 bytes > INFO [] - Total JVM Direct Memory: 378.880mb (397284478 bytes) > INFO [] - Framework: 128.000mb (134217728 bytes) > INFO [] - Task: 0 bytes > INFO [] - Network: 250.880mb (263066750 bytes) > INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) > INFO [] - JVM Overhead: 307.200mb (322122552 bytes) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Comment Edited] (FLINK-25321) standalone deploy on k8s,pod always OOM killed,actual heap memory usage is normal, gc is normal
[ https://issues.apache.org/jira/browse/FLINK-25321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461157#comment-17461157 ] Gao Fei edited comment on FLINK-25321 at 12/17/21, 1:52 AM: [~wangyang0918] I tried to adjust the overhead ratio from the default 0.1 to 0.3. It really works well, but how do I judge how much memory is needed? The overhead memory mainly contains those pieces of memory, and there is no specific description in the document. I use Native Memory Tracking to track the off-heap memory, is it overhead = thread+code+gc+compiler+internal+symbol? Do you want to set at least 1.5GB here? There is another problem. I set the total memory of the TM process, but the actual process RSS memory will always exceed this value. Does it mean that the overhead memory is only a relative value and cannot be absolutely restricted? (taskmanager.memory.jvm-overhead.fraction: 0.3) INFO [] - Final TaskExecutor Memory configuration: INFO [] - Total Process Memory: 3.000gb (3221225472 bytes) INFO [] - Total Flink Memory: 1.850gb (1986422336 bytes) INFO [] - Total JVM Heap Memory: 1.540gb (1653562372 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task: 1.415gb (1519344644 bytes) INFO [] - Total Off-heap Memory: 317.440mb (332859964 bytes) INFO [] - Managed: 0 bytes INFO [] - Total JVM Direct Memory: 317.440mb (332859964 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task: 0 bytes INFO [] - Network: 189.440mb (198642236 bytes) INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) INFO [] - JVM Overhead: 921.600mb (966367680 bytes) Native Memory Tracking: Total: reserved=4211MB +32MB, committed=2992MB +517MB - Java Heap (reserved=1578MB, committed=1578MB +464MB) (mmap: reserved=1578MB, committed=1578MB +464MB) - Class (reserved=1103MB +2MB, committed=89MB +1MB) (classes #14013 -213) (malloc=3MB #20610 +1596) (mmap: reserved=1100MB +2MB, committed=87MB +1MB) - Thread (reserved=854MB +1MB, committed=854MB +1MB) (thread #848 +1) (stack: reserved=850MB +1MB, committed=850MB +1MB) (malloc=3MB #5077 +6) (arena=1MB #1692 +2) - Code (reserved=252MB +1MB, committed=49MB +6MB) (malloc=8MB +1MB #15043 +1500) (mmap: reserved=244MB, committed=41MB +5MB) - GC (reserved=121MB +15MB, committed=121MB +32MB) (malloc=31MB +15MB #44400 +9384) (mmap: reserved=91MB, committed=91MB +17MB) - Compiler (reserved=3MB, committed=3MB) (malloc=3MB #4000 +134) - Internal (reserved=262MB +3MB, committed=262MB +3MB) (malloc=262MB +3MB #51098 +2499) - Symbol (reserved=20MB, committed=20MB) (malloc=18MB #160625 -83) (arena=2MB #1) - Native Memory Tracking (reserved=5MB, committed=5MB) (tracking overhead=5MB) - Arena Chunk (reserved=11MB +10MB, committed=11MB +10MB) (malloc=11MB +10MB) - Unknown (reserved=3MB, committed=0MB) (mmap: reserved=3MB, committed=0MB) was (Author: jackin853): I tried to adjust the overhead ratio from the default 0.1 to 0.3. It really works well, but how do I judge how much memory is needed? The overhead memory mainly contains those pieces of memory, and there is no specific description in the document. I use Native Memory Tracking to track the off-heap memory, is it overhead = thread+code+gc+compiler+internal+symbol? Do you want to set at least 1.5GB here? There is another problem. I set the total memory of the TM process, but the actual process RSS memory will always exceed this value. Does it mean that the overhead memory is only a relative value and cannot be absolutely restricted? (taskmanager.memory.jvm-overhead.fraction: 0.3) INFO [] - Final TaskExecutor Memory configuration: INFO [] - Total Process Memory: 3.000gb (3221225472 bytes) INFO [] - Total Flink Memory: 1.850gb (1986422336 bytes) INFO [] - Total JVM Heap Memory: 1.540gb (1653562372 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task:
[jira] [Commented] (FLINK-25321) standalone deploy on k8s,pod always OOM killed,actual heap memory usage is normal, gc is normal
[ https://issues.apache.org/jira/browse/FLINK-25321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461157#comment-17461157 ] Gao Fei commented on FLINK-25321: - I tried to adjust the overhead ratio from the default 0.1 to 0.3. It really works well, but how do I judge how much memory is needed? The overhead memory mainly contains those pieces of memory, and there is no specific description in the document. I use Native Memory Tracking to track the off-heap memory, is it overhead = thread+code+gc+compiler+internal+symbol? Do you want to set at least 1.5GB here? There is another problem. I set the total memory of the TM process, but the actual process RSS memory will always exceed this value. Does it mean that the overhead memory is only a relative value and cannot be absolutely restricted? (taskmanager.memory.jvm-overhead.fraction: 0.3) INFO [] - Final TaskExecutor Memory configuration: INFO [] - Total Process Memory: 3.000gb (3221225472 bytes) INFO [] - Total Flink Memory: 1.850gb (1986422336 bytes) INFO [] - Total JVM Heap Memory: 1.540gb (1653562372 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task: 1.415gb (1519344644 bytes) INFO [] - Total Off-heap Memory: 317.440mb (332859964 bytes) INFO [] - Managed: 0 bytes INFO [] - Total JVM Direct Memory: 317.440mb (332859964 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task: 0 bytes INFO [] - Network: 189.440mb (198642236 bytes) INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) INFO [] - JVM Overhead: 921.600mb (966367680 bytes) !image-2021-12-17-09-51-10-924.png! > standalone deploy on k8s,pod always OOM killed,actual heap memory usage is > normal, gc is normal > --- > > Key: FLINK-25321 > URL: https://issues.apache.org/jira/browse/FLINK-25321 > Project: Flink > Issue Type: Bug > Components: Deployment / Kubernetes >Affects Versions: 1.11.3 > Environment: Flink 1.11.3 > k8s v1.21.0 > standlone deployment >Reporter: Gao Fei >Priority: Major > > Start a cluster on k8s, deploy in standalone mode, a jobmanager pod (1G) and > a taskmanager pod (3372MB limit), the total memory configuration of the Flink > TM process is 3072MB, and the managed configuration is 0, both of which are > on the heap memory. Now the pod It will always be OOM killed, and the total > process memory will always exceed 3072MB. I saw that the system has adopted > jemlloc. There is no 64M problem. The application itself has not applied for > direct memory. It is strange why the process is always killed by OOM after a > period of time. > > INFO [] - Final TaskExecutor Memory configuration: > INFO [] - Total Process Memory: 3.000gb (3221225472 bytes) > INFO [] - Total Flink Memory: 2.450gb (2630667464 bytes) > INFO [] - Total JVM Heap Memory: 2.080gb (2233382986 bytes) > INFO [] - Framework: 128.000mb (134217728 bytes) > INFO [] - Task: 1.955gb (2099165258 bytes) > INFO [] - Total Off-heap Memory: 378.880mb (397284478 bytes) > INFO [] - Managed: 0 bytes > INFO [] - Total JVM Direct Memory: 378.880mb (397284478 bytes) > INFO [] - Framework: 128.000mb (134217728 bytes) > INFO [] - Task: 0 bytes > INFO [] - Network: 250.880mb (263066750 bytes) > INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) > INFO [] - JVM Overhead: 307.200mb (322122552 bytes) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (FLINK-25321) standalone deploy on k8s,pod always OOM killed,actual heap memory usage is normal, gc is normal
Gao Fei created FLINK-25321: --- Summary: standalone deploy on k8s,pod always OOM killed,actual heap memory usage is normal, gc is normal Key: FLINK-25321 URL: https://issues.apache.org/jira/browse/FLINK-25321 Project: Flink Issue Type: Bug Components: Deployment / Kubernetes Affects Versions: 1.11.3 Environment: Flink 1.11.3 k8s v1.21.0 standlone deployment Reporter: Gao Fei Start a cluster on k8s, deploy in standalone mode, a jobmanager pod (1G) and a taskmanager pod (3372MB limit), the total memory configuration of the Flink TM process is 3072MB, and the managed configuration is 0, both of which are on the heap memory. Now the pod It will always be OOM killed, and the total process memory will always exceed 3072MB. I saw that the system has adopted jemlloc. There is no 64M problem. The application itself has not applied for direct memory. It is strange why the process is always killed by OOM after a period of time. INFO [] - Final TaskExecutor Memory configuration: INFO [] - Total Process Memory: 3.000gb (3221225472 bytes) INFO [] - Total Flink Memory: 2.450gb (2630667464 bytes) INFO [] - Total JVM Heap Memory: 2.080gb (2233382986 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task: 1.955gb (2099165258 bytes) INFO [] - Total Off-heap Memory: 378.880mb (397284478 bytes) INFO [] - Managed: 0 bytes INFO [] - Total JVM Direct Memory: 378.880mb (397284478 bytes) INFO [] - Framework: 128.000mb (134217728 bytes) INFO [] - Task: 0 bytes INFO [] - Network: 250.880mb (263066750 bytes) INFO [] - JVM Metaspace: 256.000mb (268435456 bytes) INFO [] - JVM Overhead: 307.200mb (322122552 bytes) -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (FLINK-22838) Flink Dashboard display incorrect Version in 1.13,actual display 1.12.2
[ https://issues.apache.org/jira/browse/FLINK-22838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gao Fei closed FLINK-22838. --- Fix Version/s: (was: 1.13.2) Resolution: Not A Problem > Flink Dashboard display incorrect Version in 1.13,actual display 1.12.2 > --- > > Key: FLINK-22838 > URL: https://issues.apache.org/jira/browse/FLINK-22838 > Project: Flink > Issue Type: Bug > Components: Runtime / Web Frontend >Affects Versions: 1.13.0, 1.13.1 >Reporter: Gao Fei >Priority: Minor > Original Estimate: 504h > Remaining Estimate: 504h > > Flink Dashboard display incorrect Version in 1.13.1,actual display 1.12.2 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-22838) Flink Dashboard display incorrect Version in 1.13,actual display 1.12.2
Gao Fei created FLINK-22838: --- Summary: Flink Dashboard display incorrect Version in 1.13,actual display 1.12.2 Key: FLINK-22838 URL: https://issues.apache.org/jira/browse/FLINK-22838 Project: Flink Issue Type: Bug Components: Runtime / Web Frontend Affects Versions: 1.13.1, 1.13.0 Reporter: Gao Fei Fix For: 1.13.2 Flink Dashboard display incorrect Version in 1.13.1,actual display 1.12.2 -- This message was sent by Atlassian Jira (v8.3.4#803005)