[jira] [Commented] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186353#comment-17186353 ] ShenDa commented on FLINK-19005: [~chesnay] Looks like it's hard to solve the problem. Do we any plan or good idea to fix this? Or is there any other place to discuss the problem? > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission, Runtime / Configuration, > Runtime / Coordination >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip, heap_dump_echo_lee.tar.xz, > modified-jdbc-inputformat.png, origin-jdbc-inputformat.png > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > > > === Summary == > > Case 1, reported by [~gestevez]: > * Flink 1.11.1 > * Java 11 > * Maximum Metaspace size set to 512mb > * Custom Batch job, submitted daily > * Requires restart every 15 days after an OOM > Case 2, reported by [~Echo Lee]: > * Flink 1.11.0 > * Java 11 > * G1GC > * WordCount Batch job, submitted every second / every 5 minutes > * eventually fails TaskExecutor with OOM > Case 3, reported by [~DaDaShen] > * Flink 1.11.0 > * Java 11 > * WordCount Batch job, submitted every 5 seconds > * growing Metaspace, eventually OOM > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186221#comment-17186221 ] ShenDa edited comment on FLINK-19005 at 8/28/20, 2:57 AM: -- [~chesnay] Thanks for your detailed instruction. But I still think there's maybe something wrong in Flink. I find that the JdbcInputFormat & JdbcOutputFormat is key reason cause the Metaspace OOM, because the java.sql.DriverManager doesn't release the reference of the Driver. The DriverManager is loaded by java.internal.ClassLoader but the driver is loaded by ChildFisrtClassLoader, which means the ChildFirstClassLoader can't be garbage collected according analyzation of dump file. The following code is used by me to reproduce the issue and I use org.postgresql.Driver as jdbc Driver. {code:java} public static void main(String[] args) throws Exception { EnvironmentSettings envSettings = EnvironmentSettings.newInstance() .useBlinkPlanner() !origin-jdbc-inputformat.png! .inBatchMode() .build(); TableEnvironment tEnv = TableEnvironment.create(envSettings); tEnv.executeSql( "CREATE TABLE " + INPUT_TABLE + "(" + "id BIGINT," + "timestamp6_col TIMESTAMP(6)," + "timestamp9_col TIMESTAMP(6)," + "time_col TIME," + "real_col FLOAT," + "decimal_col DECIMAL(10, 4)" + ") WITH (" + " 'connector.type'='jdbc'," + " 'connector.url'='" + DB_URL + "'," + " 'connector.table'='" + INPUT_TABLE + "'," + " 'connector.USERNAME'='" + USERNAME + "'," + " 'connector.PASSWORD'='" + PASSWORD + "'" + ")" ); TableResult tableResult = tEnv.executeSql("SELECT timestamp6_col, decimal_col FROM " + INPUT_TABLE); tableResult.collect(); } {code} And below diagram shows the Metaspace usage constantly growing up, and finally TaskManager will be offline. !origin-jdbc-inputformat.png! Additional, I try to fix this issue by appending the following code to the function closeInputFormat() which can finally trigger garbage collect in Metaspace. {code:java} try{ final Enumeration drivers = DriverManager.getDrivers(); while (drivers.hasMoreElements()) { DriverManager.deregisterDriver(drivers.nextElement()); } } catch (SQLException se) { LOG.info("Inputformat couldn't be closed - " + se.getMessage()); } {code} The following diagram shows the usage of Metaspace will be decreased. !modified-jdbc-inputformat.png! So, do you think it's a flink problem, and should we create a new issue to fix. was (Author: dadashen): [~chesnay] Thanks for your detailed instruction. But I still think there's maybe something wrong in Flink. I find that the JdbcInputFormat & JdbcOutputFormat is key reason cause the Metaspace OOM, because the java.sql.DriverManager doesn't release the reference of the Driver. The DriverManager is loaded by java.internal.ClassLoader but the driver is loaded by ChildFisrtClassLoader, which means the ChildFirstClassLoader can't be garbage collected according analyzation of dump file. The following code is used by me to reproduce the issue and I use org.postgresql.Driver as jdbc Driver. {code:java} public static void main(String[] args) throws Exception { EnvironmentSettings envSettings = EnvironmentSettings.newInstance() .useBlinkPlanner() !origin-jdbc-inputformat.png! .inBatchMode() .build(); TableEnvironment tEnv = TableEnvironment.create(envSettings); tEnv.executeSql( "CREATE TABLE " + INPUT_TABLE + "(" + "id BIGINT," + "timestamp6_col TIMESTAMP(6)," + "timestamp9_col TIMESTAMP(6)," + "time_col TIME," + "real_col FLOAT," + "decimal_col DECIMAL(10, 4)" + ") WITH (" + " 'connector.type'='jdbc'," + " 'connector.url'='" + DB_URL + "'," + " 'connector.table'='" + INPUT_TABLE + "'," +
[jira] [Commented] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17186221#comment-17186221 ] ShenDa commented on FLINK-19005: [~chesnay] Thanks for your detailed instruction. But I still think there's maybe something wrong in Flink. I find that the JdbcInputFormat & JdbcOutputFormat is key reason cause the Metaspace OOM, because the java.sql.DriverManager doesn't release the reference of the Driver. The DriverManager is loaded by java.internal.ClassLoader but the driver is loaded by ChildFisrtClassLoader, which means the ChildFirstClassLoader can't be garbage collected according analyzation of dump file. The following code is used by me to reproduce the issue and I use org.postgresql.Driver as jdbc Driver. {code:java} public static void main(String[] args) throws Exception { EnvironmentSettings envSettings = EnvironmentSettings.newInstance() .useBlinkPlanner() !origin-jdbc-inputformat.png! .inBatchMode() .build(); TableEnvironment tEnv = TableEnvironment.create(envSettings); tEnv.executeSql( "CREATE TABLE " + INPUT_TABLE + "(" + "id BIGINT," + "timestamp6_col TIMESTAMP(6)," + "timestamp9_col TIMESTAMP(6)," + "time_col TIME," + "real_col FLOAT," + "decimal_col DECIMAL(10, 4)" + ") WITH (" + " 'connector.type'='jdbc'," + " 'connector.url'='" + DB_URL + "'," + " 'connector.table'='" + INPUT_TABLE + "'," + " 'connector.USERNAME'='" + USERNAME + "'," + " 'connector.PASSWORD'='" + PASSWORD + "'" + ")" ); TableResult tableResult = tEnv.executeSql("SELECT timestamp6_col, decimal_col FROM " + INPUT_TABLE); tableResult.collect(); } {code} And below diagram shows the Metaspace usage constantly growing up, and finally TaskManager will be offline. !origin-jdbc-inputformat.png! Additional, I try to fix this issue by appending the following code to the function closeInputFormat() which can finally trigger garbage collect in Metaspace. {code:java} try{ final Enumeration drivers = DriverManager.getDrivers(); while (drivers.hasMoreElements()) { DriverManager.deregisterDriver(drivers.nextElement()); } } catch (SQLException se) { LOG.info("Inputformat couldn't be closed - " + se.getMessage()); } {code} The following diagram shows the usage of Metaspace will be decreased. !modified-jdbc-inputformat.png! > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission, Runtime / Configuration, > Runtime / Coordination >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip, heap_dump_echo_lee.tar.xz, > modified-jdbc-inputformat.png, origin-jdbc-inputformat.png > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > > > === Summary == > > Case 1, reported by [~gestevez]: > * Flink 1.11.1 > * Java 11 > * Maximum Metaspace size set to 512mb > * Custom Batch job, submitted daily > * Requires restart every 15 days after an OOM > Case 2, reported by [~Echo Lee]: > * Flink 1.11.0 > * Java 11 > * G1GC > * WordCount Batch job, submitted every second / every 5 minutes > * eventually fails TaskExecutor with OOM > Case 3, reported by [~DaDaShen] > * Flink 1.11.0 > * Java 11 > * WordCount Batch job, submitted every 5 seconds > * growing Metaspace, eventually OOM > -- This message was se
[jira] [Updated] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShenDa updated FLINK-19005: --- Attachment: modified-jdbc-inputformat.png > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission, Runtime / Configuration, > Runtime / Coordination >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip, heap_dump_echo_lee.tar.xz, > modified-jdbc-inputformat.png, origin-jdbc-inputformat.png > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > > > === Summary == > > Case 1, reported by [~gestevez]: > * Flink 1.11.1 > * Java 11 > * Maximum Metaspace size set to 512mb > * Custom Batch job, submitted daily > * Requires restart every 15 days after an OOM > Case 2, reported by [~Echo Lee]: > * Flink 1.11.0 > * Java 11 > * G1GC > * WordCount Batch job, submitted every second / every 5 minutes > * eventually fails TaskExecutor with OOM > Case 3, reported by [~DaDaShen] > * Flink 1.11.0 > * Java 11 > * WordCount Batch job, submitted every 5 seconds > * growing Metaspace, eventually OOM > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShenDa updated FLINK-19005: --- Attachment: origin-jdbc-inputformat.png > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission, Runtime / Configuration, > Runtime / Coordination >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip, heap_dump_echo_lee.tar.xz, > origin-jdbc-inputformat.png > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > > > === Summary == > > Case 1, reported by [~gestevez]: > * Flink 1.11.1 > * Java 11 > * Maximum Metaspace size set to 512mb > * Custom Batch job, submitted daily > * Requires restart every 15 days after an OOM > Case 2, reported by [~Echo Lee]: > * Flink 1.11.0 > * Java 11 > * G1GC > * WordCount Batch job, submitted every second / every 5 minutes > * eventually fails TaskExecutor with OOM > Case 3, reported by [~DaDaShen] > * Flink 1.11.0 > * Java 11 > * WordCount Batch job, submitted every 5 seconds > * growing Metaspace, eventually OOM > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183928#comment-17183928 ] ShenDa edited comment on FLINK-19005 at 8/25/20, 10:58 AM: --- [~chesnay] I'm willing to know how you can draw a conclusion that the class leaking is caused by java.sql.DriverManager from the dump files. I'm still no thinking to locate the key problem. BTW, I tried several times to using wordcount job to reproduce metaspace OOM. But this time flink was running well and no metaspace OOM occurred, so It was my mistake. was (Author: dadashen): [~chesnay] I'm willing to know how you can analyze the class leaking is caused by java.sql.DriverManager from the dump files. I'm still no thinking to locate the key problem. BTW, I tried several times to using wordcount job to reproduce metaspace OOM. But this time flink was running well and no metaspace OOM occurred, so It was my mistake. > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission, Runtime / Configuration, > Runtime / Coordination >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip, heap_dump_echo_lee.tar.xz > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > > > === Summary == > > Case 1, reported by [~gestevez]: > * Flink 1.11.1 > * Java 11 > * Maximum Metaspace size set to 512mb > * Custom Batch job, submitted daily > * Requires restart every 15 days after an OOM > Case 2, reported by [~Echo Lee]: > * Flink 1.11.0 > * Java 11 > * G1GC > * WordCount Batch job, submitted every second / every 5 minutes > * eventually fails TaskExecutor with OOM > Case 3, reported by [~DaDaShen] > * Flink 1.11.0 > * Java 11 > * WordCount Batch job, submitted every 5 seconds > * growing Metaspace, eventually OOM > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183928#comment-17183928 ] ShenDa commented on FLINK-19005: [~chesnay] I'm willing to know how you can analyze the class leaking is caused by java.sql.DriverManager from the dump files. I'm still no thinking to locate the key problem. BTW, I tried several times to using wordcount job to reproduce metaspace OOM. But this time flink was running well and no metaspace OOM occurred, so It was my mistake. > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: Client / Job Submission, Runtime / Configuration, > Runtime / Coordination >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > Attachments: heap_dump_after_10_executions.zip, > heap_dump_after_1_execution.zip, heap_dump_echo_lee.tar.xz > > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > > > === Summary == > > Case 1, reported by [~gestevez]: > * Flink 1.11.1 > * Java 11 > * Maximum Metaspace size set to 512mb > * Custom Batch job, submitted daily > * Requires restart every 15 days after an OOM > Case 2, reported by [~Echo Lee]: > * Flink 1.11.0 > * Java 11 > * G1GC > * WordCount Batch job, submitted every second / every 5 minutes > * eventually fails TaskExecutor with OOM > Case 3, reported by [~DaDaShen] > * Flink 1.11.0 > * Java 11 > * WordCount Batch job, submitted every 5 seconds > * growing Metaspace, eventually OOM > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181741#comment-17181741 ] ShenDa commented on FLINK-19005: [~chesnay] I didn't do this on a local cluster. I use a script to submit job for every 5 seconds on standalone cluster, so I don't know how many times execution will trigger the OOM. it's a long time to occur by default Metaspace configuration. But you can observe the usage of metatspace, you'll find that the space never release and grows up continuously. > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Chesnay Schepler >Priority: Major > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181718#comment-17181718 ] ShenDa commented on FLINK-19005: [~chesnay] Yes, I use flink 1.11.0 with it's batch word count on jdk11 environment. > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Matthias >Priority: Major > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181704#comment-17181704 ] ShenDa edited comment on FLINK-19005 at 8/21/20, 8:16 AM: -- [~mapohl] Hi Matthias, I meet the same problem as [~gestevez] does. To figure out Metaspace leak is not caused by my code, I specifically submit the word count job. And for every times, the Metaspace grows up and never release. No matter how large the Metaspace I configured, the Metaspace OOM will always occur. was (Author: dadashen): [~mapohl] Hi Matthias, I meet the same problem as [~gestevez] does. To figure out Metaspace leak is not caused by my code, I specifically submit the word count job. And for every times, the Metaspace grows up and never release until the OOM occurred. > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Matthias >Priority: Major > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-19005) used metaspace grow on every execution
[ https://issues.apache.org/jira/browse/FLINK-19005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181704#comment-17181704 ] ShenDa commented on FLINK-19005: [~mapohl] Hi Matthias, I meet the same problem as [~gestevez] does. To figure out Metaspace leak is not caused by my code, I specifically submit the word count job. And for every times, the Metaspace grows up and never release until the OOM occurred. > used metaspace grow on every execution > -- > > Key: FLINK-19005 > URL: https://issues.apache.org/jira/browse/FLINK-19005 > Project: Flink > Issue Type: Bug > Components: API / DataSet, Client / Job Submission >Affects Versions: 1.11.1 >Reporter: Guillermo Sánchez >Assignee: Matthias >Priority: Major > > Hi ! > Im running a 1.11.1 flink cluster, where I execute batch jobs made with > DataSet API. > I submit these jobs every day to calculate daily data. > In every execution, cluster's used metaspace increase by 7MB and its never > released. > This ends up with an OutOfMemoryError caused by Metaspace every 15 days and i > need to restart the cluster to clean the metaspace > taskmanager.memory.jvm-metaspace.size is set to 512mb > Any idea of what could be causing this metaspace grow and why is it not > released ? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-15649) Support mounting volumes
[ https://issues.apache.org/jira/browse/FLINK-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164259#comment-17164259 ] ShenDa edited comment on FLINK-15649 at 7/24/20, 8:31 AM: -- [~azagrebin]I agree with the routine that implemnting pod template first and go back to discuss how can we introduce this feature. I'll close the PR. was (Author: dadashen): [~azagrebin]I agree with the routine that implemnting pod template first and go back to discuss how can we introduce this feature. I'll close the PR. > Support mounting volumes > - > > Key: FLINK-15649 > URL: https://issues.apache.org/jira/browse/FLINK-15649 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Reporter: Canbin Zheng >Priority: Major > Labels: pull-request-available > > Add support for mounting K8S volumes, including emptydir, hostpath, pv etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15649) Support mounting volumes
[ https://issues.apache.org/jira/browse/FLINK-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164259#comment-17164259 ] ShenDa commented on FLINK-15649: [~azagrebin]I agree with the routine that implemnting pod template first and go back to discuss how can we introduce this feature. I'll close the PR. > Support mounting volumes > - > > Key: FLINK-15649 > URL: https://issues.apache.org/jira/browse/FLINK-15649 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Reporter: Canbin Zheng >Priority: Major > Labels: pull-request-available > > Add support for mounting K8S volumes, including emptydir, hostpath, pv etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-15649) Support mounting volumes
[ https://issues.apache.org/jira/browse/FLINK-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157327#comment-17157327 ] ShenDa edited comment on FLINK-15649 at 7/14/20, 12:22 PM: --- [~felixzheng]The pod-template and mounting volumes are two different feature. We can't remove the this ticket's feature cause the pod-template can do the same work. If so, the already exist features(mounting hadoop conf and mounting flink conf) can also be removed. was (Author: dadashen): [~felixzheng]The pod-template and mounting volumes are two different feature. We can't remove the this ticket's feature cause the pod-template can do the same work. If so, the already exist features(mounting hadoop conf and mounting flink conf) can't also be removed. > Support mounting volumes > - > > Key: FLINK-15649 > URL: https://issues.apache.org/jira/browse/FLINK-15649 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Reporter: Canbin Zheng >Priority: Major > > Add support for mounting K8S volumes, including emptydir, hostpath, pv etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15649) Support mounting volumes
[ https://issues.apache.org/jira/browse/FLINK-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157327#comment-17157327 ] ShenDa commented on FLINK-15649: [~felixzheng]The pod-template and mounting volumes are two different feature. We can't remove the this ticket's feature cause the pod-template can do the same work. If so, the already exist features(mounting hadoop conf and mounting flink conf) can't also be removed. > Support mounting volumes > - > > Key: FLINK-15649 > URL: https://issues.apache.org/jira/browse/FLINK-15649 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Reporter: Canbin Zheng >Priority: Major > > Add support for mounting K8S volumes, including emptydir, hostpath, pv etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15649) Support mounting volumes
[ https://issues.apache.org/jira/browse/FLINK-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157309#comment-17157309 ] ShenDa commented on FLINK-15649: [~felixzheng]I don't think we can replace this feature with pod template, because users may just wish for mounting volume without configure any other pod specification. And I also meet the trouble that introduced complicated config options for volume mounting when I was doing this work, but I think it's a not a big problem. BTW, could you illustrate the config options you have designed in your internal branch, and we can discuss for this trouble to reach an agreement. > Support mounting volumes > - > > Key: FLINK-15649 > URL: https://issues.apache.org/jira/browse/FLINK-15649 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Reporter: Canbin Zheng >Priority: Major > > Add support for mounting K8S volumes, including emptydir, hostpath, pv etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15649) Support mounting volumes
[ https://issues.apache.org/jira/browse/FLINK-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17157287#comment-17157287 ] ShenDa commented on FLINK-15649: [~tison] I have done some job for this feature until now, could you assign this ticket to me to finish this work. > Support mounting volumes > - > > Key: FLINK-15649 > URL: https://issues.apache.org/jira/browse/FLINK-15649 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Reporter: Canbin Zheng >Priority: Major > > Add support for mounting K8S volumes, including emptydir, hostpath, pv etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15649) Support mounting volumes
[ https://issues.apache.org/jira/browse/FLINK-15649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17156499#comment-17156499 ] ShenDa commented on FLINK-15649: [~felixzheng], Do you have any plan to do this work? > Support mounting volumes > - > > Key: FLINK-15649 > URL: https://issues.apache.org/jira/browse/FLINK-15649 > Project: Flink > Issue Type: Sub-task > Components: Deployment / Kubernetes >Reporter: Canbin Zheng >Priority: Major > > Add support for mounting K8S volumes, including emptydir, hostpath, pv etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (FLINK-18244) Support setup customized system environment before submitting test job
[ https://issues.apache.org/jira/browse/FLINK-18244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShenDa resolved FLINK-18244. Resolution: Not A Problem > Support setup customized system environment before submitting test job > -- > > Key: FLINK-18244 > URL: https://issues.apache.org/jira/browse/FLINK-18244 > Project: Flink > Issue Type: Improvement > Components: Tests >Reporter: ShenDa >Priority: Major > > The new approach to implement e2e test suggests developer to use > FlinkDistribution to submit test job. But at present, we can't specify system > environment by invoking submitSqlJob() or submitJob(). This result in that > some connectors can not work if needful system environment not setup, suck > like hbase connector needs HADOOP_CLASSPATH. > So I think we can do the work below: > 1)Add a new method in AutoClosableProcess and it's builder class for putting > specified system environment. > 2)Add a new interface that just used to configure system environment and let > class SQLJobSubmission and JobSubmission extends this interface. > 3) Modify the methods, submitJob() and submitSQLJob(), in FlinkDistribution > to setup system environment before invoking runBlocking() or runNonBlocking() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-18244) Support setup customized system environment before submitting test job
[ https://issues.apache.org/jira/browse/FLINK-18244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShenDa updated FLINK-18244: --- Description: The new approach to implement e2e test suggests developer to use FlinkDistribution to submit test job. But at present, we can't specify system environment by invoking submitSqlJob() or submitJob(). This result in that some connectors can not work if needful system environment not setup, suck like hbase connector needs HADOOP_CLASSPATH. So I think we can do the work below: 1)Add a new method in AutoClosableProcess and it's builder class for putting specified system environment. 2)Add a new interface that just used to configure system environment and let class SQLJobSubmission and JobSubmission extends this interface. 3) Modify the methods, submitJob() and submitSQLJob(), in FlinkDistribution to setup system environment before invoking runBlocking() or runNonBlocking() was: The new approach to implement e2e test suggests developer to use FlinkDistribution to submit test job. But at present, we can't specify system environment by invoking submitSqlJob() or submitJob(). This result in that some connectors can not work if needful system environment not setup, suck like hbase connector needs HADOOP_CLASSPATH. So I think we can do the work below: 1)Add a new method in AutoClosableProcess and it's builder class for putting specified system environment. 2)Add a new interface that just used to configure system environment and let class SQLJobSubmission and JobSubmission extends this interface. 3) Modify the methods, submitJob() and submitSQLJob(), in FlinkDistribution to setup system environment before involing runBlocking() or runNonBlocking() > Support setup customized system environment before submitting test job > -- > > Key: FLINK-18244 > URL: https://issues.apache.org/jira/browse/FLINK-18244 > Project: Flink > Issue Type: Improvement > Components: Tests >Reporter: ShenDa >Priority: Major > > The new approach to implement e2e test suggests developer to use > FlinkDistribution to submit test job. But at present, we can't specify system > environment by invoking submitSqlJob() or submitJob(). This result in that > some connectors can not work if needful system environment not setup, suck > like hbase connector needs HADOOP_CLASSPATH. > So I think we can do the work below: > 1)Add a new method in AutoClosableProcess and it's builder class for putting > specified system environment. > 2)Add a new interface that just used to configure system environment and let > class SQLJobSubmission and JobSubmission extends this interface. > 3) Modify the methods, submitJob() and submitSQLJob(), in FlinkDistribution > to setup system environment before invoking runBlocking() or runNonBlocking() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-18244) Support setup customized system environment before submitting test job
ShenDa created FLINK-18244: -- Summary: Support setup customized system environment before submitting test job Key: FLINK-18244 URL: https://issues.apache.org/jira/browse/FLINK-18244 Project: Flink Issue Type: Improvement Components: Tests Reporter: ShenDa The new approach to implement e2e test suggests developer to use FlinkDistribution to submit test job. But at present, we can't specify system environment by invoking submitSqlJob() or submitJob(). This result in that some connectors can not work if needful system environment not setup, suck like hbase connector needs HADOOP_CLASSPATH. So I think we can do the work below: 1)Add a new method in AutoClosableProcess and it's builder class for putting specified system environment. 2)Add a new interface that just used to configure system environment and let class SQLJobSubmission and JobSubmission extends this interface. 3) Modify the methods, submitJob() and submitSQLJob(), in FlinkDistribution to setup system environment before involing runBlocking() or runNonBlocking() -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17678) Create flink-sql-connector-hbase module to shade HBase
[ https://issues.apache.org/jira/browse/FLINK-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120672#comment-17120672 ] ShenDa commented on FLINK-17678: Ok, I'm going to use another way to implement hbase e2e test. > Create flink-sql-connector-hbase module to shade HBase > -- > > Key: FLINK-17678 > URL: https://issues.apache.org/jira/browse/FLINK-17678 > Project: Flink > Issue Type: Improvement > Components: Connectors / HBase >Reporter: ShenDa >Assignee: ShenDa >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Currently, flink doesn't contains a hbase uber jar, so users have to add > hbase dependency manually. > Could I create new module called flink-sql-connector-hbase like elasticsaerch > and kafka sql -connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17678) Support fink-sql-connector-hbase
[ https://issues.apache.org/jira/browse/FLINK-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120392#comment-17120392 ] ShenDa commented on FLINK-17678: [~jark] Because the logic in test_sql_client.sh constrains a shade jar only can contain four name form directory or file. For any class in shade jar, the only name for them is starting with org.apache.flink. But to avoid throwing exception by hbase region server, these classes org.apache.hadoop.hbase.codec.* can't be shaded. This result in there are some class not starting with org.apache.flink, so the jar I shaded can't pass the verification in test_sql_client.sh. And thanks for your suggestion, I'll try using Java to implement hbase e2e test instead of shell scripts. > Support fink-sql-connector-hbase > > > Key: FLINK-17678 > URL: https://issues.apache.org/jira/browse/FLINK-17678 > Project: Flink > Issue Type: Improvement > Components: Connectors / HBase >Reporter: ShenDa >Assignee: ShenDa >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Currently, flink doesn't contains a hbase uber jar, so users have to add > hbase dependency manually. > Could I create new module called flink-sql-connector-hbase like elasticsaerch > and kafka sql -connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-17678) Support fink-sql-connector-hbase
[ https://issues.apache.org/jira/browse/FLINK-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119427#comment-17119427 ] ShenDa edited comment on FLINK-17678 at 5/29/20, 9:23 AM: -- I encountered an problem that the hbase shade jar we built can't pass the verification in test_sql_client.sh : {code:sh} if ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/META-INF"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/LICENSE"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/NOTICE"* ]] ; then echo "Bad file in JAR: $EXTRACTED_FILE" exit 1 fi {code} To avoiding exception thrown by hbase region server, we didn't shade the org.apache.hadoop.hbase.codec.*, because the hbase server can not find the shaded codec class to decoding data. So I shade the hbase dependencies by blow configuration: {code:xml} org.apache.hbase org.apache.flink.hbase.shaded.org.apache.hbase org.apache.hadoop.hbase.codec.* {code} But by this way, the shaded hbase jar contains a directory named org/apache/hadoop/hbase. This directory does not obey the rule * ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] *. So how can I solve this problem? Could I modify the test_sql_client.sh script? was (Author: dadashen): I encountered an problem that the hbase shade jar we built can't pass the verification in test_sql_client.sh : {code:sh} if ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/META-INF"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/LICENSE"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/NOTICE"* ]] ; then echo "Bad file in JAR: $EXTRACTED_FILE" exit 1 fi {code} To avoiding exception thrown by hbase region server, we didn't shade the org.apache.hadoop.hbase.codec.*, because the hbase server can not find the shaded codec class to decoding data. So I shade the hbase dependencies by blow configuration: {code:xml} org.apache.hbase org.apache.flink.hbase.shaded.org.apache.hbase org.apache.hadoop.hbase.codec.* {code} But by this way, the shaded hbase jar contains a directory named org/apache/hadoop/hbase. This directory does not obey the rule * ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] *. So how can I solve this problem? > Support fink-sql-connector-hbase > > > Key: FLINK-17678 > URL: https://issues.apache.org/jira/browse/FLINK-17678 > Project: Flink > Issue Type: Improvement > Components: Connectors / HBase >Reporter: ShenDa >Assignee: ShenDa >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Currently, flink doesn't contains a hbase uber jar, so users have to add > hbase dependency manually. > Could I create new module called flink-sql-connector-hbase like elasticsaerch > and kafka sql -connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-17678) Support fink-sql-connector-hbase
[ https://issues.apache.org/jira/browse/FLINK-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119427#comment-17119427 ] ShenDa edited comment on FLINK-17678 at 5/29/20, 9:20 AM: -- I encountered an problem that the hbase shade jar we built can't pass the verification in test_sql_client.sh : {code:sh} if ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/META-INF"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/LICENSE"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/NOTICE"* ]] ; then echo "Bad file in JAR: $EXTRACTED_FILE" exit 1 fi {code} To avoiding exception thrown by hbase region server, we didn't shade the org.apache.hadoop.hbase.codec.*, because the hbase server can not find the shaded codec class to decoding data. So I shade the hbase dependencies by blow configuration: {code:xml} org.apache.hbase org.apache.flink.hbase.shaded.org.apache.hbase org.apache.hadoop.hbase.codec.* {code} But by this way, the shaded hbase jar contains a directory named org/apache/hadoop/hbase. This directory does not obey the rule * ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] *. So how can I solve this problem? was (Author: dadashen): I encountered an problem that the hbase shade jar we built can't pass the verification in test_sql_client.sh : {code:sh} if ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/META-INF"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/LICENSE"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/NOTICE"* ]] ; then echo "Bad file in JAR: $EXTRACTED_FILE" exit 1 fi {code} To avoiding exception thrown by hbase region server, we didn't shade the org.apache.hadoop.hbase.codec.*, because the hbase server can not find the shaded codec class to decoding data. So we shade the hbase dependencies blow: {code:xml} org.apache.hbase org.apache.flink.hbase.shaded.org.apache.hbase org.apache.hadoop.hbase.codec.* {code} But by this way, the shaded hbase jar contains a directory named org/apache/hadoop/hbase. This directory does not obey the rule * ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] *. So how can I solve this problem? > Support fink-sql-connector-hbase > > > Key: FLINK-17678 > URL: https://issues.apache.org/jira/browse/FLINK-17678 > Project: Flink > Issue Type: Improvement > Components: Connectors / HBase >Reporter: ShenDa >Assignee: ShenDa >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Currently, flink doesn't contains a hbase uber jar, so users have to add > hbase dependency manually. > Could I create new module called flink-sql-connector-hbase like elasticsaerch > and kafka sql -connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-17678) Support fink-sql-connector-hbase
[ https://issues.apache.org/jira/browse/FLINK-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119427#comment-17119427 ] ShenDa edited comment on FLINK-17678 at 5/29/20, 9:20 AM: -- I encountered an problem that the hbase shade jar we built can't pass the verification in test_sql_client.sh : {code:sh} if ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/META-INF"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/LICENSE"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/NOTICE"* ]] ; then echo "Bad file in JAR: $EXTRACTED_FILE" exit 1 fi {code} To avoiding exception thrown by hbase region server, we didn't shade the org.apache.hadoop.hbase.codec.*, because the hbase server can not find the shaded codec class to decoding data. So we shade the hbase dependencies blow: {code:xml} org.apache.hbase org.apache.flink.hbase.shaded.org.apache.hbase org.apache.hadoop.hbase.codec.* {code} But by this way, the shaded hbase jar contains a directory named org/apache/hadoop/hbase. This directory does not obey the rule * ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] *. So how can I solve this problem? was (Author: dadashen): I encountered an problem that the hbase shade jar we built can't pass the verification in test_sql_client.sh : {code:sh} if ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/META-INF"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/LICENSE"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/NOTICE"* ]] ; then echo "Bad file in JAR: $EXTRACTED_FILE" exit 1 fi {code} To avoiding exception thrown by hbase region server, we didn't shade the org.apache.hadoop.hbase.codec.*, because the hbase server can not find the shaded codec class to decoding data(byte[]). So we shade the hbase dependencies blow: {code:xml} org.apache.hbase org.apache.flink.hbase.shaded.org.apache.hbase org.apache.hadoop.hbase.codec.* {code} But by this way, the shaded hbase jar contains a directory named org/apache/hadoop/hbase. This directory does not obey the rule * ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] *. So how can I solve this problem? > Support fink-sql-connector-hbase > > > Key: FLINK-17678 > URL: https://issues.apache.org/jira/browse/FLINK-17678 > Project: Flink > Issue Type: Improvement > Components: Connectors / HBase >Reporter: ShenDa >Assignee: ShenDa >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Currently, flink doesn't contains a hbase uber jar, so users have to add > hbase dependency manually. > Could I create new module called flink-sql-connector-hbase like elasticsaerch > and kafka sql -connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-17678) Support fink-sql-connector-hbase
[ https://issues.apache.org/jira/browse/FLINK-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119427#comment-17119427 ] ShenDa edited comment on FLINK-17678 at 5/29/20, 9:19 AM: -- I encountered an problem that the hbase shade jar we built can't pass the verification in test_sql_client.sh : {code:sh} if ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/META-INF"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/LICENSE"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/NOTICE"* ]] ; then echo "Bad file in JAR: $EXTRACTED_FILE" exit 1 fi {code} To avoiding exception thrown by hbase region server, we didn't shade the org.apache.hadoop.hbase.codec.*, because the hbase server can not find the shaded codec class to decoding data(byte[]). So we shade the hbase dependencies blow: {code:xml} org.apache.hbase org.apache.flink.hbase.shaded.org.apache.hbase org.apache.hadoop.hbase.codec.* {code} But by this way, the shaded hbase jar contains a directory named org/apache/hadoop/hbase. This directory does not obey the rule * ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] *. So how can I solve this problem? was (Author: dadashen): I encountered an problem that the hbase shade jar we built can't pass the verification in test_sql_client.sh : {code:shell} if ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/META-INF"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/LICENSE"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/NOTICE"* ]] ; then echo "Bad file in JAR: $EXTRACTED_FILE" exit 1 fi {code} To avoiding exception thrown by hbase region server, we didn't shade the org.apache.hadoop.hbase.codec.*, because the hbase server can not find the shaded codec class to decoding data(byte[]). So we shade the hbase dependencies blow: {code:xml} org.apache.hbase org.apache.flink.hbase.shaded.org.apache.hbase org.apache.hadoop.hbase.codec.* {code} But by this way, the shaded hbase jar contains a directory named org/apache/hadoop/hbase. This directory does not obey the rule * ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] *. So how can I solve this problem? > Support fink-sql-connector-hbase > > > Key: FLINK-17678 > URL: https://issues.apache.org/jira/browse/FLINK-17678 > Project: Flink > Issue Type: Improvement > Components: Connectors / HBase >Reporter: ShenDa >Assignee: ShenDa >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Currently, flink doesn't contains a hbase uber jar, so users have to add > hbase dependency manually. > Could I create new module called flink-sql-connector-hbase like elasticsaerch > and kafka sql -connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-17678) Support fink-sql-connector-hbase
[ https://issues.apache.org/jira/browse/FLINK-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119427#comment-17119427 ] ShenDa edited comment on FLINK-17678 at 5/29/20, 9:19 AM: -- I encountered an problem that the hbase shade jar we built can't pass the verification in test_sql_client.sh : {code:shell} if ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/META-INF"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/LICENSE"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/NOTICE"* ]] ; then echo "Bad file in JAR: $EXTRACTED_FILE" exit 1 fi {code} To avoiding exception thrown by hbase region server, we didn't shade the org.apache.hadoop.hbase.codec.*, because the hbase server can not find the shaded codec class to decoding data(byte[]). So we shade the hbase dependencies blow: {code:xml} org.apache.hbase org.apache.flink.hbase.shaded.org.apache.hbase org.apache.hadoop.hbase.codec.* {code} But by this way, the shaded hbase jar contains a directory named org/apache/hadoop/hbase. This directory does not obey the rule * ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] *. So how can I solve this problem? was (Author: dadashen): We encountered an problem that the hbase shade jar we built can't pass the verification in test_sql_client.sh : {code:shell} if ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/META-INF"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/LICENSE"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/NOTICE"* ]] ; then echo "Bad file in JAR: $EXTRACTED_FILE" exit 1 fi {code} To avoiding exception thrown by hbase region server, we didn't shade the org.apache.hadoop.hbase.codec.*, because the hbase server can not find the shaded codec class to decoding data(byte[]). So we shade the hbase dependencies blow: {code:xml} org.apache.hbase org.apache.flink.hbase.shaded.org.apache.hbase org.apache.hadoop.hbase.codec.* {code} But by this way, the shaded hbase jar contains a directory named org/apache/hadoop/hbase. This directory does not obey the rule * ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] *. So how can I solve this problem? > Support fink-sql-connector-hbase > > > Key: FLINK-17678 > URL: https://issues.apache.org/jira/browse/FLINK-17678 > Project: Flink > Issue Type: Improvement > Components: Connectors / HBase >Reporter: ShenDa >Assignee: ShenDa >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Currently, flink doesn't contains a hbase uber jar, so users have to add > hbase dependency manually. > Could I create new module called flink-sql-connector-hbase like elasticsaerch > and kafka sql -connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17678) Support fink-sql-connector-hbase
[ https://issues.apache.org/jira/browse/FLINK-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119427#comment-17119427 ] ShenDa commented on FLINK-17678: We encountered an problem that the hbase shade jar we built can't pass the verification in test_sql_client.sh : {code:shell} if ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/META-INF"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/LICENSE"* ]] && \ ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/NOTICE"* ]] ; then echo "Bad file in JAR: $EXTRACTED_FILE" exit 1 fi {code} To avoiding exception thrown by hbase region server, we didn't shade the org.apache.hadoop.hbase.codec.*, because the hbase server can not find the shaded codec class to decoding data(byte[]). So we shade the hbase dependencies blow: {code:xml} org.apache.hbase org.apache.flink.hbase.shaded.org.apache.hbase org.apache.hadoop.hbase.codec.* {code} But by this way, the shaded hbase jar contains a directory named org/apache/hadoop/hbase. This directory does not obey the rule * ! [[ $EXTRACTED_FILE = "$EXTRACTED_JAR/org/apache/flink"* ]] *. So how can I solve this problem? > Support fink-sql-connector-hbase > > > Key: FLINK-17678 > URL: https://issues.apache.org/jira/browse/FLINK-17678 > Project: Flink > Issue Type: Improvement > Components: Connectors / HBase >Reporter: ShenDa >Assignee: ShenDa >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Currently, flink doesn't contains a hbase uber jar, so users have to add > hbase dependency manually. > Could I create new module called flink-sql-connector-hbase like elasticsaerch > and kafka sql -connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17939) Translate "Python Table API Installation" page into Chinese
[ https://issues.apache.org/jira/browse/FLINK-17939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116695#comment-17116695 ] ShenDa commented on FLINK-17939: Hi, [~jark] I‘m interested in this issue, could you assignee to me to translate this page into Chinese > Translate "Python Table API Installation" page into Chinese > --- > > Key: FLINK-17939 > URL: https://issues.apache.org/jira/browse/FLINK-17939 > Project: Flink > Issue Type: Sub-task > Components: API / Python, chinese-translation, Documentation >Reporter: Jark Wu >Priority: Major > > The page url is > https://ci.apache.org/projects/flink/flink-docs-master/zh/dev/table/python/installation.html > The markdown file is located in flink/docs/dev/table/python/installation.zh.md -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14359) Create a module called flink-sql-connector-hbase to shade HBase
[ https://issues.apache.org/jira/browse/FLINK-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116492#comment-17116492 ] ShenDa commented on FLINK-14359: [~openinx] Ok. > Create a module called flink-sql-connector-hbase to shade HBase > --- > > Key: FLINK-14359 > URL: https://issues.apache.org/jira/browse/FLINK-14359 > Project: Flink > Issue Type: New Feature > Components: Connectors / HBase >Reporter: Jingsong Lee >Assignee: Zheng Hu >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > > We need do the same thing as kafka and elasticsearch to HBase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-14359) Create a module called flink-sql-connector-hbase to shade HBase
[ https://issues.apache.org/jira/browse/FLINK-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116480#comment-17116480 ] ShenDa edited comment on FLINK-14359 at 5/26/20, 6:35 AM: -- Hi [~openinx], I'am now working on trying to shade hbase connector. Coincidentally, I find you have already done the most job of this, but there is no more work after your PR and there still some conflicts to merge your code. So do you have any plan to finish this work? was (Author: dadashen): Hi [~openinx], I'am now working on trying to shade hbase connector. Fortunately, I find you have already done the most job of this, but there is no more work after your PR and there still some conflicts to merge your code. So do you have any plan to finish this work? > Create a module called flink-sql-connector-hbase to shade HBase > --- > > Key: FLINK-14359 > URL: https://issues.apache.org/jira/browse/FLINK-14359 > Project: Flink > Issue Type: New Feature > Components: Connectors / HBase >Reporter: Jingsong Lee >Assignee: Zheng Hu >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > > We need do the same thing as kafka and elasticsearch to HBase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14359) Create a module called flink-sql-connector-hbase to shade HBase
[ https://issues.apache.org/jira/browse/FLINK-14359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116480#comment-17116480 ] ShenDa commented on FLINK-14359: Hi [~openinx], I'am now working on trying to shade hbase connector. Fortunately, I find you have already done the most job of this, but there is no more work after your PR and there still some conflicts to merge your code. So do you have any plan to finish this work? > Create a module called flink-sql-connector-hbase to shade HBase > --- > > Key: FLINK-14359 > URL: https://issues.apache.org/jira/browse/FLINK-14359 > Project: Flink > Issue Type: New Feature > Components: Connectors / HBase >Reporter: Jingsong Lee >Assignee: Zheng Hu >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > Time Spent: 10m > Remaining Estimate: 0h > > We need do the same thing as kafka and elasticsearch to HBase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17678) Support fink-sql-connector-hbase
[ https://issues.apache.org/jira/browse/FLINK-17678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106941#comment-17106941 ] ShenDa commented on FLINK-17678: [~jark] Yes, may be it's not easy to shade, but I have already implement a simply shade uber jar with weak test. I can try to make this robust, could this issue assign to me. > Support fink-sql-connector-hbase > > > Key: FLINK-17678 > URL: https://issues.apache.org/jira/browse/FLINK-17678 > Project: Flink > Issue Type: Improvement > Components: Connectors / HBase >Reporter: ShenDa >Priority: Major > Fix For: 1.11.0 > > > Currently, flink doesn't contains a hbase uber jar, so users have to add > hbase dependency manually. > Could I create new module called flink-sql-connector-hbase like elasticsaerch > and kafka sql -connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-17678) Support fink-sql-connector-hbase
ShenDa created FLINK-17678: -- Summary: Support fink-sql-connector-hbase Key: FLINK-17678 URL: https://issues.apache.org/jira/browse/FLINK-17678 Project: Flink Issue Type: Improvement Reporter: ShenDa Currently, flink doesn't contains a hbase uber jar, so users have to add hbase dependency manually. Could I create new module called flink-sql-connector-hbase like elasticsaerch and kafka sql -connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-16713) Support source mode of elasticsearch connector
[ https://issues.apache.org/jira/browse/FLINK-16713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106823#comment-17106823 ] ShenDa commented on FLINK-16713: [~jark],I'm colleague of [~jackylau],Our company truly has the scenario of reading elasticsearch like other companies do. As what your said above, we can start from implementing a bounded es source.So is it possible for us to draft a new design doc to discuss details? > Support source mode of elasticsearch connector > -- > > Key: FLINK-16713 > URL: https://issues.apache.org/jira/browse/FLINK-16713 > Project: Flink > Issue Type: Improvement > Components: Connectors / ElasticSearch >Affects Versions: 1.10.0 >Reporter: jackray wang >Priority: Major > > [https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/connect.html#elasticsearch-connector] > For append-only queries, the connector can also operate in [append > mode|https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/connect.html#update-modes] > for exchanging only INSERT messages with the external system. If no key is > defined by the query, a key is automatically generated by Elasticsearch. > I want to know ,why the connector of flink with ES just support sink but > doesn't support source .Which version could add this feature to ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (FLINK-14868) Provides the ability for multiple sinks to write data serially
[ https://issues.apache.org/jira/browse/FLINK-14868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShenDa resolved FLINK-14868. Resolution: Resolved We implements a TableSink which wraps other sinks that needed to be written serially > Provides the ability for multiple sinks to write data serially > -- > > Key: FLINK-14868 > URL: https://issues.apache.org/jira/browse/FLINK-14868 > Project: Flink > Issue Type: Wish > Components: API / DataStream, Table SQL / Runtime >Affects Versions: 1.9.1 >Reporter: ShenDa >Priority: Major > > At present, Flink can use multiple sinks to write data into different data > source such as HBase,Kafka,Elasticsearch,etc.And this process is concurrent > ,in other words, one record will be written into data sources simultaneously. > But there is no approach that can sinking data serially.We really wish Flink > can providing this kind of ability that a sink can write data into target > database only after the previous sink transfers data successfully.And if the > previous sink encounters any exception, the next sink will not work. > h1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (FLINK-15182) Add FileTableSink based on StreamingFileSink
[ https://issues.apache.org/jira/browse/FLINK-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShenDa resolved FLINK-15182. Resolution: Resolved FLIP-115 is discussing this issue > Add FileTableSink based on StreamingFileSink > > > Key: FLINK-15182 > URL: https://issues.apache.org/jira/browse/FLINK-15182 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem >Reporter: ShenDa >Priority: Major > > Flink already has FileSystem connector that can sink data into specified file > system, but it just writes data to a target path without bukcet and it can't > configure file rolling policy neither. We can use StreamingFileSink to avoid > those shortcomings instead, however, the StreamingFileSink is a SinkFunction > that users have to implement their processing logic with DataStreamAPI. > Therefore, we want to add a new FileTableSink based on StreamingFileSink. > This will help us connect file system by using TableAPI&SQL. But according to > the two ways of StreamingFileSink to sinking data, bulk format and row > format, it comes a question to us. Does the FileTableSink have to support > bulk format and row format simultaneously? Or separating this sink into two > table sinks, BulkFileTableSink and RowFileTableSink, which write data by > using one format respectively? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-14868) Provides the ability for multiple sinks to write data serially
[ https://issues.apache.org/jira/browse/FLINK-14868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995265#comment-16995265 ] ShenDa edited comment on FLINK-14868 at 12/13/19 1:32 AM: -- [~rmetzger] Thanks for your reply, it's a nice a idea to implement a custom sink that wrap series sinks to write data together. But, if I want to use TableAPI&SQL to sink data into different targets by order, is there any good way to achieve this? was (Author: dadashen): [~rmetzger] Thanks for your reply, it's a nice a idea to implement a custom sink that wrap series sinks to write data together. But, if I want to use TableAPI&SQL to sink data into different targets by order, is there any good way to archive this? > Provides the ability for multiple sinks to write data serially > -- > > Key: FLINK-14868 > URL: https://issues.apache.org/jira/browse/FLINK-14868 > Project: Flink > Issue Type: Wish > Components: API / DataStream, Table SQL / Runtime >Affects Versions: 1.9.1 >Reporter: ShenDa >Priority: Major > > At present, Flink can use multiple sinks to write data into different data > source such as HBase,Kafka,Elasticsearch,etc.And this process is concurrent > ,in other words, one record will be written into data sources simultaneously. > But there is no approach that can sinking data serially.We really wish Flink > can providing this kind of ability that a sink can write data into target > database only after the previous sink transfers data successfully.And if the > previous sink encounters any exception, the next sink will not work. > h1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-14868) Provides the ability for multiple sinks to write data serially
[ https://issues.apache.org/jira/browse/FLINK-14868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995265#comment-16995265 ] ShenDa commented on FLINK-14868: [~rmetzger] Thanks for your reply, it's a nice a idea to implement a custom sink that wrap series sinks to write data together. But, if I want to use TableAPI&SQL to sink data into different targets by order, is there any good way to archive this? > Provides the ability for multiple sinks to write data serially > -- > > Key: FLINK-14868 > URL: https://issues.apache.org/jira/browse/FLINK-14868 > Project: Flink > Issue Type: Wish > Components: API / DataStream, Table SQL / Runtime >Affects Versions: 1.9.1 >Reporter: ShenDa >Priority: Major > > At present, Flink can use multiple sinks to write data into different data > source such as HBase,Kafka,Elasticsearch,etc.And this process is concurrent > ,in other words, one record will be written into data sources simultaneously. > But there is no approach that can sinking data serially.We really wish Flink > can providing this kind of ability that a sink can write data into target > database only after the previous sink transfers data successfully.And if the > previous sink encounters any exception, the next sink will not work. > h1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-15182) Add FileTableSink based on StreamingFileSink
[ https://issues.apache.org/jira/browse/FLINK-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994575#comment-16994575 ] ShenDa commented on FLINK-15182: [~jark] it's very glad to see the intention of implementing a unified FileSystem TableSink. But, we also find the code in filesystem connector is deprecated, and the flink-table module contains CsvTableSink&CsvTableSource. So we think of the current hierarchical structure of FileSystem Connector needs to be refactored. Besides, may be, the CSV should be kinds of format like json, but not be a individual TableSource or TableSink. > Add FileTableSink based on StreamingFileSink > > > Key: FLINK-15182 > URL: https://issues.apache.org/jira/browse/FLINK-15182 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem >Reporter: ShenDa >Priority: Major > > Flink already has FileSystem connector that can sink data into specified file > system, but it just writes data to a target path without bukcet and it can't > configure file rolling policy neither. We can use StreamingFileSink to avoid > those shortcomings instead, however, the StreamingFileSink is a SinkFunction > that users have to implement their processing logic with DataStreamAPI. > Therefore, we want to add a new FileTableSink based on StreamingFileSink. > This will help us connect file system by using TableAPI&SQL. But according to > the two ways of StreamingFileSink to sinking data, bulk format and row > format, it comes a question to us. Does the FileTableSink have to support > bulk format and row format simultaneously? Or separating this sink into two > table sinks, BulkFileTableSink and RowFileTableSink, which write data by > using one format respectively? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15182) Add FileTableSink based on StreamingFileSink
[ https://issues.apache.org/jira/browse/FLINK-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShenDa updated FLINK-15182: --- Description: Flink already has FileSystem connector that can sink data into specified file system, but it just writes data to a target path without bukcet and it can't configure file rolling policy neither. We can use StreamingFileSink to avoid those shortcomings instead, however, the StreamingFileSink is a SinkFunction that users have to implement their processing logic with DataStreamAPI. Therefore, we want to add a new FileTableSink based on StreamingFileSink. This will help us connect file system by using TableAPI&SQL. But according to the two ways of StreamingFileSink to sinking data, bulk format and row format, it comes a question to us. Does the FileTableSink have to support bulk format and row format simultaneously? Or separating this sink into two table sinks, BulkFileTableSink and RowFileTableSink, which write data by using one format respectively? was: Flink already has FileSystem connector that can sink data into specified file system, but it just writes data to a target path without bukcet and it can't configure file rolling policy neither. We can use StreamingFileSink to avoid those shortcomings instead, however, the StreamingFileSink is a SinkFunction that users have to implement their processing logic with DataStreamAPI. Therefore, we want to add a new FileTableSink based on StreamingFileSink. This will help us connect file system by using TableAPI&SQL. But according to the two ways of StreamingFileSink to sinking data, bulk format and row format, it comes a question to us. Does the FileTableSink have to support bulk format and row format simultaneously? Or separating this sink into two table sinks, BulkFileTableSink and RowFileTableSink, which writes data by using one format respectively? > Add FileTableSink based on StreamingFileSink > > > Key: FLINK-15182 > URL: https://issues.apache.org/jira/browse/FLINK-15182 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem >Affects Versions: 1.9.0 >Reporter: ShenDa >Priority: Major > > Flink already has FileSystem connector that can sink data into specified file > system, but it just writes data to a target path without bukcet and it can't > configure file rolling policy neither. We can use StreamingFileSink to avoid > those shortcomings instead, however, the StreamingFileSink is a SinkFunction > that users have to implement their processing logic with DataStreamAPI. > Therefore, we want to add a new FileTableSink based on StreamingFileSink. > This will help us connect file system by using TableAPI&SQL. But according to > the two ways of StreamingFileSink to sinking data, bulk format and row > format, it comes a question to us. Does the FileTableSink have to support > bulk format and row format simultaneously? Or separating this sink into two > table sinks, BulkFileTableSink and RowFileTableSink, which write data by > using one format respectively? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15182) Add FileTableSink based on StreamingFileSink
[ https://issues.apache.org/jira/browse/FLINK-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShenDa updated FLINK-15182: --- Description: Flink already has FileSystem connector that can sink data into specified file system, but it just writes data to a target path without bukcet and it can't configure file rolling policy neither. We can use StreamingFileSink to avoid those shortcomings instead, however, the StreamingFileSink is a SinkFunction that users have to implement their processing logic with DataStreamAPI. Therefore, we want to add a new FileTableSink based on StreamingFileSink. This will help us connect file system by using TableAPI&SQL. But according to the two ways of StreamingFileSink to sinking data, bulk format and row format, it comes a question to us. Does the FileTableSink have to support bulk format and row format simultaneously? Or separating this sink into two table sinks, BulkFileTableSink and RowFileTableSink, which writes data by using one format respectively? was: Flink already has FileSystem connector that can sink data into specified file system, but it just writes data to a target path without bukcet and it can't configure file rolling policy neither. We can use StreamingFileSink to avoid those shortcomings instead, however, the StreamingFileSink is a SinkFunction that users have to implement their processing logic with DataStreamAPI. Therefore, we want to add a new FileTableSink based on StreamingFileSink. This will help us connect file system by using TableAPI&SQL. But according to the two ways of StreamingFileSink to sinking data, bulk format and row format, it comes a question to us. Does the FileTableSink have to support bulk format and row format simultaneously? Or separating this sink into two table sinks, BulkFileTableSink and RowFileTableSink, which writes data by using one format respectively. > Add FileTableSink based on StreamingFileSink > > > Key: FLINK-15182 > URL: https://issues.apache.org/jira/browse/FLINK-15182 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem >Affects Versions: 1.9.0 >Reporter: ShenDa >Priority: Major > > Flink already has FileSystem connector that can sink data into specified file > system, but it just writes data to a target path without bukcet and it can't > configure file rolling policy neither. We can use StreamingFileSink to avoid > those shortcomings instead, however, the StreamingFileSink is a SinkFunction > that users have to implement their processing logic with DataStreamAPI. > Therefore, we want to add a new FileTableSink based on StreamingFileSink. > This will help us connect file system by using TableAPI&SQL. But according to > the two ways of StreamingFileSink to sinking data, bulk format and row > format, it comes a question to us. Does the FileTableSink have to support > bulk format and row format simultaneously? Or separating this sink into two > table sinks, BulkFileTableSink and RowFileTableSink, which writes data by > using one format respectively? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15182) Add FileTableSink based on StreamingFileSink
[ https://issues.apache.org/jira/browse/FLINK-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShenDa updated FLINK-15182: --- Description: Flink already has FileSystem connector that can sink data into specified file system, but it just writes data to a target path without bukcet and it can't configure file rolling policy neither. We can use StreamingFileSink to avoid those shortcomings instead, however, the StreamingFileSink is a SinkFunction that users have to implement their processing logic with DataStreamAPI. Therefore, we want to add a new FileTableSink based on StreamingFileSink. This will help us connect file system by using TableAPI&SQL. But according to the two ways of StreamingFileSink to sinking data, bulk format and row format, it comes a question to us. Does the FileTableSink have to support bulk format and row format simultaneously? Or separating this sink into two table sinks, BulkFileTableSink and RowFileTableSink, which writes data by using one format respectively. was: Flink already has FileSystem connector that can sink data into specified file system, but it just writes data to a target path without bukcet and it can't configure file rolling policy neither. We can use StreamingFileSink to avoid those shortcomings instead, however, the StreamingFileSink is a SinkFunction that users have to implement their processing logic with DataStreamAPI. Therefore, we want to add a new FileTableSink based on StreamingFileSink. This will help us connect file system by using TableAPI&SQL. But according to the two ways of StreamingFileSink to sinking data, bulk format and row format, it comes a question to us. Does the FileTableSink have to support bulk format and row format simultaneously? Or separating this sink into two sinks, BulkFileTableSink and RowFileTableSink, which writes data by using one format respectively. > Add FileTableSink based on StreamingFileSink > > > Key: FLINK-15182 > URL: https://issues.apache.org/jira/browse/FLINK-15182 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem >Affects Versions: 1.9.0 >Reporter: ShenDa >Priority: Major > > Flink already has FileSystem connector that can sink data into specified file > system, but it just writes data to a target path without bukcet and it can't > configure file rolling policy neither. We can use StreamingFileSink to avoid > those shortcomings instead, however, the StreamingFileSink is a SinkFunction > that users have to implement their processing logic with DataStreamAPI. > Therefore, we want to add a new FileTableSink based on StreamingFileSink. > This will help us connect file system by using TableAPI&SQL. But according to > the two ways of StreamingFileSink to sinking data, bulk format and row > format, it comes a question to us. Does the FileTableSink have to support > bulk format and row format simultaneously? Or separating this sink into two > table sinks, BulkFileTableSink and RowFileTableSink, which writes data by > using one format respectively. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15182) Add FileTableSink based on StreamingFileSink
[ https://issues.apache.org/jira/browse/FLINK-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShenDa updated FLINK-15182: --- Description: Flink already has FileSystem connector that can sink data into specified file system, but it just writes data to a target path without bukcet and it can't configure file rolling policy neither. We can use StreamingFileSink to avoid those shortcomings instead, however, the StreamingFileSink is a SinkFunction that users have to implement their processing logic with DataStreamAPI. Therefore, we want to add a new FileTableSink based on StreamingFileSink. This will help us connect file system by using TableAPI&SQL. But according to the two ways of StreamingFileSink to sinking data, bulk format and row format, it comes a question to us. Does the FileTableSink have to support bulk format and row format simultaneously? Or separating this sink into two sinks, BulkFileTableSink and RowFileTableSink, which writes data by using one format respectively. was: Flink already has FileSystem connector that can sink data into specified file system, but it just writes data to a target path without bukcet and it can't configure file rolling policy neither. We can use StreamingFileSink to avoid those shortcomings instead, however, the StreamingFileSink is a SinkFunction that users have to implement their processing logic with DataStreamAPI. Therefore, we want to add a new FileTableSink based on StreamingFileSink. This will help us connect file system by using TableAPI&SQL. But according to the two ways of StreamingFileSink to sinking data, bulk format and row format, it comes a question to us. Does the FileTableSink support bulk format and row format simultaneously? Or separating this sink into two sinks, BulkFileTableSink and RowFileTableSink, which writes data by using one format respectively. > Add FileTableSink based on StreamingFileSink > > > Key: FLINK-15182 > URL: https://issues.apache.org/jira/browse/FLINK-15182 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem >Affects Versions: 1.9.0 >Reporter: ShenDa >Priority: Major > > Flink already has FileSystem connector that can sink data into specified file > system, but it just writes data to a target path without bukcet and it can't > configure file rolling policy neither. We can use StreamingFileSink to avoid > those shortcomings instead, however, the StreamingFileSink is a SinkFunction > that users have to implement their processing logic with DataStreamAPI. > Therefore, we want to add a new FileTableSink based on StreamingFileSink. > This will help us connect file system by using TableAPI&SQL. But according to > the two ways of StreamingFileSink to sinking data, bulk format and row > format, it comes a question to us. Does the FileTableSink have to support > bulk format and row format simultaneously? Or separating this sink into two > sinks, BulkFileTableSink and RowFileTableSink, which writes data by using one > format respectively. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (FLINK-15182) Add FileTableSink based on StreamingFileSink
[ https://issues.apache.org/jira/browse/FLINK-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ShenDa updated FLINK-15182: --- Description: Flink already has FileSystem connector that can sink data into specified file system, but it just writes data to a target path without bukcet and it can't configure file rolling policy neither. We can use StreamingFileSink to avoid those shortcomings instead, however, the StreamingFileSink is a SinkFunction that users have to implement their processing logic with DataStreamAPI. Therefore, we want to add a new FileTableSink based on StreamingFileSink. This will help us connect file system by using TableAPI&SQL. But according to the two ways of StreamingFileSink to sinking data, bulk format and row format, it comes a question to us. Does the FileTableSink support bulk format and row format simultaneously? Or separating this sink into two sinks, BulkFileTableSink and RowFileTableSink, which writes data by using one format respectively. was: Flink already has FileSystem connector that can sink data into specified file system, but it just writes data to a target path without bukcet and it can't configure file rolling policy neither. We can use StreamingFileSink to avoid those shortcomings instead, however, the StreamingFileSink is a SinkFunction that users have to implement their processing logic with DataStreamAPI. Therefore, we want to add a new FileTableSink based on StreamingFileSink. This will help us connect file system by using TableAPI&SQL. But according the two ways of StreamingFileSink to sinking data, bulk format and row format, it comes a question to us. Does the FileTableSink support bulk format and row format simultaneously? Or separating this sink into two sinks, BulkFileTableSink and RowFileTableSink, which writes data by using one format respectively. > Add FileTableSink based on StreamingFileSink > > > Key: FLINK-15182 > URL: https://issues.apache.org/jira/browse/FLINK-15182 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem >Affects Versions: 1.9.0 >Reporter: ShenDa >Priority: Major > > Flink already has FileSystem connector that can sink data into specified file > system, but it just writes data to a target path without bukcet and it can't > configure file rolling policy neither. We can use StreamingFileSink to avoid > those shortcomings instead, however, the StreamingFileSink is a SinkFunction > that users have to implement their processing logic with DataStreamAPI. > Therefore, we want to add a new FileTableSink based on StreamingFileSink. > This will help us connect file system by using TableAPI&SQL. But according to > the two ways of StreamingFileSink to sinking data, bulk format and row > format, it comes a question to us. Does the FileTableSink support bulk format > and row format simultaneously? Or separating this sink into two sinks, > BulkFileTableSink and RowFileTableSink, which writes data by using one format > respectively. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-15182) Add FileTableSink based on StreamingFileSink
ShenDa created FLINK-15182: -- Summary: Add FileTableSink based on StreamingFileSink Key: FLINK-15182 URL: https://issues.apache.org/jira/browse/FLINK-15182 Project: Flink Issue Type: Improvement Components: Connectors / FileSystem Affects Versions: 1.9.0 Reporter: ShenDa Flink already has FileSystem connector that can sink data into specified file system, but it just writes data to a target path without bukcet and it can't configure file rolling policy neither. We can use StreamingFileSink to avoid those shortcomings instead, however, the StreamingFileSink is a SinkFunction that users have to implement their processing logic with DataStreamAPI. Therefore, we want to add a new FileTableSink based on StreamingFileSink. This will help us connect file system by using TableAPI&SQL. But according the two ways of StreamingFileSink to sinking data, bulk format and row format, it comes a question to us. Does the FileTableSink support bulk format and row format simultaneously? Or separating this sink into two sinks, BulkFileTableSink and RowFileTableSink, which writes data by using one format respectively. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-12584) Add Bucket File Syetem Table Sink
[ https://issues.apache.org/jira/browse/FLINK-12584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16987569#comment-16987569 ] ShenDa commented on FLINK-12584: [~zhangjun] Is there any progress of this issue? Our group also need a bucket fs table sink. > Add Bucket File Syetem Table Sink > - > > Key: FLINK-12584 > URL: https://issues.apache.org/jira/browse/FLINK-12584 > Project: Flink > Issue Type: New Feature > Components: Table SQL / API >Affects Versions: 1.8.0, 1.9.0 >Reporter: Jun Zhang >Assignee: Jun Zhang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > 1. *Motivation* > In flink, the file system (especially hdfs) is a very common output, but for > users using sql, it does not support directly using sql to write data to the > file system, so I want to add a bucket file system table sink, the user can > register it to StreamTableEnvironment, so that table api and sql api can > directly use the sink to write stream data to filesystem > 2.*example* > tEnv.connect(new Bucket().basePath("hdfs://localhost/tmp/flink-data")) > .withFormat(new Json().deriveSchema()) > .withSchema(new Schema() > .field("name", Types. STRING ()) > .field("age", Types. INT ()) > .inAppendMode() > .registerTableSink("myhdfssink"); > tEnv.sqlUpdate("insert into myhdfssink SELECT * FROM mytablesource"); > > 3.*Some ideas to achieve this function* > 1) Add a class Bucket which extends from ConnectorDescriptor, add some > properties, such as basePath. > 2) Add a class BucketValidator which extends from the > ConnectorDescriptorValidator and is used to check the bucket descriptor. > 3) Add a class FileSystemTableSink to implement the StreamTableSink > interface. In the emitDataStream method, construct StreamingFileSink for > writing data to filesystem according to different properties. > 4) Add a factory class FileSystemTableSinkFactory to implement the > StreamTableSinkFactory interface for constructing FileSystemTableSink > 5) The parameters of withFormat method is the implementation classes of the > FormatDescriptor interface, such as Json, Csv, and we can add Parquet、Orc > later. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-14868) Provides the ability for multiple sinks to write data serially
ShenDa created FLINK-14868: -- Summary: Provides the ability for multiple sinks to write data serially Key: FLINK-14868 URL: https://issues.apache.org/jira/browse/FLINK-14868 Project: Flink Issue Type: Wish Components: API / DataStream, Table SQL / Runtime Affects Versions: 1.9.1 Reporter: ShenDa Fix For: 1.9.2 At present, Flink can use multiple sinks to write data into different data source such as HBase,Kafka,Elasticsearch,etc.And this process is concurrent ,in other words, one record will be written into data sources simultaneously. But there is no approach that can sinking data serially.We really wish Flink can providing this kind of ability that a sink can write data into target database only after the previous sink transfers data successfully.And if the previous sink encounters any exception, the next sink will not work. h1. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-13661) Add a stream specific CREATE TABLE SQL DDL
[ https://issues.apache.org/jira/browse/FLINK-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935513#comment-16935513 ] ShenDa edited comment on FLINK-13661 at 9/23/19 2:27 AM: - OK thanks, [~jark]. Where could we find or join this design discuss? was (Author: dadashen): OK thanks, [~jark]. Where cloud we find or join this design discuss? > Add a stream specific CREATE TABLE SQL DDL > -- > > Key: FLINK-13661 > URL: https://issues.apache.org/jira/browse/FLINK-13661 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.10.0 > > > FLINK-6962 has introduced a basic SQL DDL to create a table. However, it > doesn't support stream specific features, for example, watermark definition, > changeflag definition, computed columns, primary keys and so on. > We started a design doc[1] to discuss the concepts of source and sink to help > us have a well-defined DDL. Once the FLIP is accepted, we can start the work. > [1]: > https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit#heading=h.c05t427gfgxa > Considering the streaming DDL is a huge topic, we decided to split it into > separate FLIPs. Here comes the first one. > FLIP-66: Support time attribute in SQL DDL: > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-13661) Add a stream specific CREATE TABLE SQL DDL
[ https://issues.apache.org/jira/browse/FLINK-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935513#comment-16935513 ] ShenDa commented on FLINK-13661: OK thanks, [~jark]. Where cloud we find or join this design discuss? > Add a stream specific CREATE TABLE SQL DDL > -- > > Key: FLINK-13661 > URL: https://issues.apache.org/jira/browse/FLINK-13661 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.10.0 > > > FLINK-6962 has introduced a basic SQL DDL to create a table. However, it > doesn't support stream specific features, for example, watermark definition, > changeflag definition, computed columns, primary keys and so on. > We started a design doc[1] to discuss the concepts of source and sink to help > us have a well-defined DDL. Once the FLIP is accepted, we can start the work. > [1]: > https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit#heading=h.c05t427gfgxa > Considering the streaming DDL is a huge topic, we decided to split it into > separate FLIPs. Here comes the first one. > FLIP-66: Support time attribute in SQL DDL: > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-13661) Add a stream specific CREATE TABLE SQL DDL
[ https://issues.apache.org/jira/browse/FLINK-13661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935501#comment-16935501 ] ShenDa commented on FLINK-13661: Hi all, is there any progress or issue of combined table that combines historical and changelog data. It's really useful for us to use this kind of table. > Add a stream specific CREATE TABLE SQL DDL > -- > > Key: FLINK-13661 > URL: https://issues.apache.org/jira/browse/FLINK-13661 > Project: Flink > Issue Type: Sub-task > Components: Table SQL / API >Reporter: Jark Wu >Priority: Major > Fix For: 1.10.0 > > > FLINK-6962 has introduced a basic SQL DDL to create a table. However, it > doesn't support stream specific features, for example, watermark definition, > changeflag definition, computed columns, primary keys and so on. > We started a design doc[1] to discuss the concepts of source and sink to help > us have a well-defined DDL. Once the FLIP is accepted, we can start the work. > [1]: > https://docs.google.com/document/d/1yrKXEIRATfxHJJ0K3t6wUgXAtZq8D-XgvEnvl2uUcr0/edit#heading=h.c05t427gfgxa > Considering the streaming DDL is a huge topic, we decided to split it into > separate FLIPs. Here comes the first one. > FLIP-66: Support time attribute in SQL DDL: > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# -- This message was sent by Atlassian Jira (v8.3.4#803005)