[jira] [Created] (FLINK-11115) Port some flink.ml algorithms to table based
Weihua Jiang created FLINK-5: Summary: Port some flink.ml algorithms to table based Key: FLINK-5 URL: https://issues.apache.org/jira/browse/FLINK-5 Project: Flink Issue Type: Sub-task Components: Machine Learning Library, Table API SQL Reporter: Weihua Jiang This sub-task is to port some flink.ml algorithms to table based to verify the correctness of design and implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11114) Support wrapping inference pipeline as a UDF function in SQL
Weihua Jiang created FLINK-4: Summary: Support wrapping inference pipeline as a UDF function in SQL Key: FLINK-4 URL: https://issues.apache.org/jira/browse/FLINK-4 Project: Flink Issue Type: Sub-task Components: Machine Learning Library, Table API SQL Reporter: Weihua Jiang Though the standard ML pipeline inference usage is table based (that is, user at client construct the DAG only using table API), it is also desirable to wrap the inference logic as a UDF to be used in SQL. This means it shall be record based. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11113) Support periodically update models when inferencing
Weihua Jiang created FLINK-3: Summary: Support periodically update models when inferencing Key: FLINK-3 URL: https://issues.apache.org/jira/browse/FLINK-3 Project: Flink Issue Type: Sub-task Components: Machine Learning Library, Table API SQL Reporter: Weihua Jiang As models will be periodically updated and the inference job may running on stream and will NOT finish, it is important to have this inference job periodically reload the latest model for inference without start/stop the inference job. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11112) Support pipeline import/export
Weihua Jiang created FLINK-2: Summary: Support pipeline import/export Key: FLINK-2 URL: https://issues.apache.org/jira/browse/FLINK-2 Project: Flink Issue Type: Sub-task Components: Machine Learning Library, Table API SQL Reporter: Weihua Jiang It is quite common to have one job for training and periodical export trained pipeline and models and another job to load these exported pipeline for inference. Thus, we will need functionalities for pipeline import/export. This shall work in both streaming/batch environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11111) Create a new set of parameters
Weihua Jiang created FLINK-1: Summary: Create a new set of parameters Key: FLINK-1 URL: https://issues.apache.org/jira/browse/FLINK-1 Project: Flink Issue Type: Sub-task Components: Machine Learning Library, Table API SQL Reporter: Weihua Jiang One goal of new Table based ML Pipeline is easy for tooling. That is, for any ML/AI algorithm adapt to this ML Pipeline standard shall declare all its parameters via a well-defined interface. So that, the AI platform can uniformly get/set corresponding parameters while agnostic about the details of specific algorithm. The only difference between algorithms, from a user's perspective, is its name. All the other algorithm parameters are self descriptive. This will also be useful for future Flink ML SQL as the SQL parser can uniformly handle all these parameter things. This can greatly simplify the SQL parser. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11109) Create Table based optimizers
Weihua Jiang created FLINK-11109: Summary: Create Table based optimizers Key: FLINK-11109 URL: https://issues.apache.org/jira/browse/FLINK-11109 Project: Flink Issue Type: Sub-task Reporter: Weihua Jiang The existing optimizers in org.apache.flink.ml package are dataset based. This task is to create a new set of optimizers which are table based. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11110) Support pipeline stage type inference
Weihua Jiang created FLINK-0: Summary: Support pipeline stage type inference Key: FLINK-0 URL: https://issues.apache.org/jira/browse/FLINK-0 Project: Flink Issue Type: Sub-task Reporter: Weihua Jiang It is important for any AI platform to have the type inference for each stage and pipeline. We can have some help here for ML pipeline to support such kind of type inference (static type and compatibility check) for each stage and pipeline. That is, each stage shall declare which kind of input type and shape it can accept and it will output. Then, the framework will perform compatibility check to see whether they are compatible or not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11108) Create a new set of table based ML Pipeline classes
Weihua Jiang created FLINK-11108: Summary: Create a new set of table based ML Pipeline classes Key: FLINK-11108 URL: https://issues.apache.org/jira/browse/FLINK-11108 Project: Flink Issue Type: Sub-task Components: Machine Learning Library, Table API SQL Reporter: Weihua Jiang The main classes are: # PipelineStage (the trait for each pipeline stage) # Estimator (training stage) # Transformer (the inference/feature engineering stage) # Pipeline (the whole pipeline) # Predictor (extends Estimator, for supervised learning) Detailed design can be referred at parent JIRA's design document. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11096) Create a new table based flink ML package
Weihua Jiang created FLINK-11096: Summary: Create a new table based flink ML package Key: FLINK-11096 URL: https://issues.apache.org/jira/browse/FLINK-11096 Project: Flink Issue Type: Sub-task Components: Machine Learning Library, Table API SQL Reporter: Weihua Jiang Currently, the DataSet based ML library is under org.apache._flink.ml_ scala package and under _flink-libraries/flink-ml directory._ There are two questions related to packaging: # Shall we create a new scala/java package, e.g. org.apache.flink.table.ml? Or still stay in org.apache.flink.ml? # Shall we still put new code in flink-libraries/flink-ml directory or create a new one, e.g. flink-libraries/flink-table-ml and corresponding maven package? I implemented a prototype for the design and found that the new design is very hard to fit into existing flink.ml codebase. The existing flink.ml code is tightly coupled with DataSet API. Thus, I have to rewrite almost all parts of flink.ml to get some sample case to work. The only reusable code from flink.ml are the base math classes under _org.apache.flink.ml.math_ and _org.apache.flink.ml.metrics.distance_ packages. Considering this fact, I will prefer to create a new package org.apache.flink.table.ml and a new maven package flink-table-ml. Please feel free to give your feedbacks. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11095) Table based ML Pipeline
Weihua Jiang created FLINK-11095: Summary: Table based ML Pipeline Key: FLINK-11095 URL: https://issues.apache.org/jira/browse/FLINK-11095 Project: Flink Issue Type: New Feature Components: Machine Learning Library, Table API SQL Reporter: Weihua Jiang As Table API will be the unified high level API for both batch and streaming, it is desired to have a table API based ML Pipeline definition. This new table based ML Pipeline shall: # support unified batch/stream ML/AI functionalities (train/inference). # seamless integrated with flink Table based ecosystem. # provide a base for further flink based AI platform/tooling support. # provide a base for further flink ML SQL integration. The initial design is here [https://docs.google.com/document/d/1PLddLEMP_wn4xHwi6069f3vZL7LzkaP0MN9nAB63X90/edit?usp=sharing.] And based on this design, I made some initial implementation/prototyping. I will share the code later. This is the umbrella JIRA. I will create corresponding sub-jira for each sub-task. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8020) Deadlock found in Flink Streaming job
[ https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weihua Jiang updated FLINK-8020: Attachment: jstack53009(2).out Attach a new jstack file. After removing our collector.collect() call out of our lock region, the deadlock was gone. However, the system is still NOT able to run as we find that flink collector hangs. See the attached jstack 53009(2).out file, the thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (1/10)" holds the lock 0x0007b692ad98 and is still waiting for THIS exact lock again. This makes it hang. Our guess is that this is some issue related to async io. After changing our async redis lookup to sync redis lookup (by not using async io), the issue is gone. > Deadlock found in Flink Streaming job > - > > Key: FLINK-8020 > URL: https://issues.apache.org/jira/browse/FLINK-8020 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Streaming, Streaming Connectors >Affects Versions: 1.3.2 > Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode >Reporter: Weihua Jiang >Priority: Blocker > Attachments: jstack53009(2).out, jstack67976-2.log > > > Our streaming job run into trouble in these days after a long time smooth > running. One issue we found is > [https://issues.apache.org/jira/browse/FLINK-8019] and another one is this > one. > After analyzing the jstack, we believe we found a DEAD LOCK in flink: > 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" > hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940. > 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: > hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock > 0x0007b6aa1788. > This DEADLOCK made the job fail to proceed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-8020) Deadlock found in Flink Streaming job
[ https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243538#comment-16243538 ] Weihua Jiang commented on FLINK-8020: - Sorry, after further analyzing, it seems the deadlock is caused by the mixing of our lock with flink lock. After reorganizing our code, the deadlock can be eliminated. > Deadlock found in Flink Streaming job > - > > Key: FLINK-8020 > URL: https://issues.apache.org/jira/browse/FLINK-8020 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Streaming, Streaming Connectors >Affects Versions: 1.3.2 > Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode >Reporter: Weihua Jiang >Priority: Blocker > Attachments: jstack67976-2.log > > > Our streaming job run into trouble in these days after a long time smooth > running. One issue we found is > [https://issues.apache.org/jira/browse/FLINK-8019] and another one is this > one. > After analyzing the jstack, we believe we found a DEAD LOCK in flink: > 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" > hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940. > 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: > hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock > 0x0007b6aa1788. > This DEADLOCK made the job fail to proceed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-8020) Deadlock found in Flink Streaming job
[ https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weihua Jiang updated FLINK-8020: Attachment: jstack67976-2.log > Deadlock found in Flink Streaming job > - > > Key: FLINK-8020 > URL: https://issues.apache.org/jira/browse/FLINK-8020 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Streaming, Streaming Connectors >Affects Versions: 1.3.2 > Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode >Reporter: Weihua Jiang >Priority: Blocker > Attachments: jstack67976-2.log > > > Our streaming job run into trouble in these days after a long time smooth > running. One issue we found is [#FLINK-8019] and another one is this one. > After analyzing the jstack, we believe we found a DEAD LOCK in flink: > 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" > hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940. > 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: > hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock > 0x0007b6aa1788. > This DEADLOCK made the job fail to proceed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8020) Deadlock found in Flink Streaming job
Weihua Jiang created FLINK-8020: --- Summary: Deadlock found in Flink Streaming job Key: FLINK-8020 URL: https://issues.apache.org/jira/browse/FLINK-8020 Project: Flink Issue Type: Bug Components: Kafka Connector, Streaming, Streaming Connectors Affects Versions: 1.3.2 Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode Reporter: Weihua Jiang Priority: Blocker Attachments: jstack67976-2.log Our streaming job run into trouble in these days after a long time smooth running. One issue we found is [#FLINK-8019] and another one is this one. After analyzing the jstack, we believe we found a DEAD LOCK in flink: 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940. 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock 0x0007b6aa1788. This DEADLOCK made the job fail to proceed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-8020) Deadlock found in Flink Streaming job
[ https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weihua Jiang updated FLINK-8020: Description: Our streaming job run into trouble in these days after a long time smooth running. One issue we found is [https://issues.apache.org/jira/browse/FLINK-8019] and another one is this one. After analyzing the jstack, we believe we found a DEAD LOCK in flink: 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940. 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock 0x0007b6aa1788. This DEADLOCK made the job fail to proceed. was: Our streaming job run into trouble in these days after a long time smooth running. One issue we found is [#FLINK-8019] and another one is this one. After analyzing the jstack, we believe we found a DEAD LOCK in flink: 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940. 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock 0x0007b6aa1788. This DEADLOCK made the job fail to proceed. > Deadlock found in Flink Streaming job > - > > Key: FLINK-8020 > URL: https://issues.apache.org/jira/browse/FLINK-8020 > Project: Flink > Issue Type: Bug > Components: Kafka Connector, Streaming, Streaming Connectors >Affects Versions: 1.3.2 > Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode >Reporter: Weihua Jiang >Priority: Blocker > Attachments: jstack67976-2.log > > > Our streaming job run into trouble in these days after a long time smooth > running. One issue we found is > [https://issues.apache.org/jira/browse/FLINK-8019] and another one is this > one. > After analyzing the jstack, we believe we found a DEAD LOCK in flink: > 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" > hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940. > 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: > hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock > 0x0007b6aa1788. > This DEADLOCK made the job fail to proceed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (FLINK-8019) Flink streaming job stopped at consuming Kafka data
Weihua Jiang created FLINK-8019: --- Summary: Flink streaming job stopped at consuming Kafka data Key: FLINK-8019 URL: https://issues.apache.org/jira/browse/FLINK-8019 Project: Flink Issue Type: Bug Components: Kafka Connector Affects Versions: 1.3.2 Environment: We are using Kafka 0.8.2.1 and Flink 1.3.2 on YARN mode. Reporter: Weihua Jiang Our flink streaming job consumes data from Kafka and it worked well for a long time. However, these days we encountered several times that it stopped consuming data from Kafka. The jstack shows that it stopped at LocalBufferPool.requestBuffer. The jstack file is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (FLINK-8019) Flink streaming job stopped at consuming Kafka data
[ https://issues.apache.org/jira/browse/FLINK-8019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weihua Jiang updated FLINK-8019: Attachment: jstack-20171108-2.log > Flink streaming job stopped at consuming Kafka data > --- > > Key: FLINK-8019 > URL: https://issues.apache.org/jira/browse/FLINK-8019 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Affects Versions: 1.3.2 > Environment: We are using Kafka 0.8.2.1 and Flink 1.3.2 on YARN mode. >Reporter: Weihua Jiang > Attachments: jstack-20171108-2.log > > > Our flink streaming job consumes data from Kafka and it worked well for a > long time. However, these days we encountered several times that it stopped > consuming data from Kafka. > The jstack shows that it stopped at LocalBufferPool.requestBuffer. The jstack > file is attached. -- This message was sent by Atlassian JIRA (v6.4.14#64029)