[jira] [Created] (FLINK-11115) Port some flink.ml algorithms to table based

2018-12-09 Thread Weihua Jiang (JIRA)
Weihua Jiang created FLINK-5:


 Summary: Port some flink.ml algorithms to table based
 Key: FLINK-5
 URL: https://issues.apache.org/jira/browse/FLINK-5
 Project: Flink
  Issue Type: Sub-task
  Components: Machine Learning Library, Table API  SQL
Reporter: Weihua Jiang


This sub-task is to port some flink.ml algorithms to table based to verify the 
correctness of design and implementation. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11114) Support wrapping inference pipeline as a UDF function in SQL

2018-12-09 Thread Weihua Jiang (JIRA)
Weihua Jiang created FLINK-4:


 Summary: Support wrapping inference pipeline as a UDF function in 
SQL
 Key: FLINK-4
 URL: https://issues.apache.org/jira/browse/FLINK-4
 Project: Flink
  Issue Type: Sub-task
  Components: Machine Learning Library, Table API  SQL
Reporter: Weihua Jiang


Though the standard ML pipeline inference usage is table based (that is, user 
at client construct the DAG only using table API), it is also desirable to wrap 
the inference logic as a UDF to be used in SQL. This means it shall be record 
based. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11113) Support periodically update models when inferencing

2018-12-09 Thread Weihua Jiang (JIRA)
Weihua Jiang created FLINK-3:


 Summary: Support periodically update models when inferencing
 Key: FLINK-3
 URL: https://issues.apache.org/jira/browse/FLINK-3
 Project: Flink
  Issue Type: Sub-task
  Components: Machine Learning Library, Table API  SQL
Reporter: Weihua Jiang


As models will be periodically updated and the inference job may running on 
stream and will NOT finish, it is important to have this inference job 
periodically reload the latest model for inference without start/stop the 
inference job. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11112) Support pipeline import/export

2018-12-09 Thread Weihua Jiang (JIRA)
Weihua Jiang created FLINK-2:


 Summary: Support pipeline import/export
 Key: FLINK-2
 URL: https://issues.apache.org/jira/browse/FLINK-2
 Project: Flink
  Issue Type: Sub-task
  Components: Machine Learning Library, Table API  SQL
Reporter: Weihua Jiang


It is quite common to have one job for training and periodical export trained 
pipeline and models and another job to load these exported pipeline for 
inference. 

 

Thus, we will need functionalities for pipeline import/export. This shall work 
in both streaming/batch environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11111) Create a new set of parameters

2018-12-09 Thread Weihua Jiang (JIRA)
Weihua Jiang created FLINK-1:


 Summary: Create a new set of parameters
 Key: FLINK-1
 URL: https://issues.apache.org/jira/browse/FLINK-1
 Project: Flink
  Issue Type: Sub-task
  Components: Machine Learning Library, Table API  SQL
Reporter: Weihua Jiang


One goal of new Table based ML Pipeline is easy for tooling. That is, for any 
ML/AI algorithm adapt to this ML Pipeline standard shall declare all its 
parameters via a well-defined interface. So that, the AI platform can uniformly 
get/set corresponding parameters while agnostic about the details of specific 
algorithm. The only difference between algorithms, from a user's perspective, 
is its name. All the other algorithm parameters are self descriptive. 

 

This will also be useful for future Flink ML SQL as the SQL parser can 
uniformly handle all these parameter things. This can greatly simplify the SQL 
parser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11109) Create Table based optimizers

2018-12-09 Thread Weihua Jiang (JIRA)
Weihua Jiang created FLINK-11109:


 Summary: Create Table based optimizers
 Key: FLINK-11109
 URL: https://issues.apache.org/jira/browse/FLINK-11109
 Project: Flink
  Issue Type: Sub-task
Reporter: Weihua Jiang


The existing optimizers in org.apache.flink.ml package are dataset based. This 
task is to create a new set of optimizers which are table based. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11110) Support pipeline stage type inference

2018-12-09 Thread Weihua Jiang (JIRA)
Weihua Jiang created FLINK-0:


 Summary: Support pipeline stage type inference
 Key: FLINK-0
 URL: https://issues.apache.org/jira/browse/FLINK-0
 Project: Flink
  Issue Type: Sub-task
Reporter: Weihua Jiang


It is important for any AI platform to have the type inference for each stage 
and pipeline. We can have some help here for ML pipeline to support such kind 
of type inference (static type and compatibility check) for each stage and 
pipeline. 

 

That is, each stage shall declare which kind of input type and shape it can 
accept and it will output. Then, the framework will perform compatibility check 
to see whether they are compatible or not. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11108) Create a new set of table based ML Pipeline classes

2018-12-09 Thread Weihua Jiang (JIRA)
Weihua Jiang created FLINK-11108:


 Summary: Create a new set of table based ML Pipeline classes
 Key: FLINK-11108
 URL: https://issues.apache.org/jira/browse/FLINK-11108
 Project: Flink
  Issue Type: Sub-task
  Components: Machine Learning Library, Table API  SQL
Reporter: Weihua Jiang


The main classes are:
 # PipelineStage (the trait for each pipeline stage)
 # Estimator (training stage)
 # Transformer (the inference/feature engineering stage)
 # Pipeline (the whole pipeline)
 # Predictor (extends Estimator, for supervised learning)

Detailed design can be referred at parent JIRA's design document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11096) Create a new table based flink ML package

2018-12-07 Thread Weihua Jiang (JIRA)
Weihua Jiang created FLINK-11096:


 Summary: Create a new table based flink ML package
 Key: FLINK-11096
 URL: https://issues.apache.org/jira/browse/FLINK-11096
 Project: Flink
  Issue Type: Sub-task
  Components: Machine Learning Library, Table API  SQL
Reporter: Weihua Jiang


Currently,  the DataSet based ML library is under org.apache._flink.ml_ scala 
package and under _flink-libraries/flink-ml directory._

 

There are two questions related to packaging:
 # Shall we create a new scala/java package, e.g. org.apache.flink.table.ml? Or 
still stay in org.apache.flink.ml?
 # Shall we still put new code in flink-libraries/flink-ml directory or create 
a new one, e.g. flink-libraries/flink-table-ml and corresponding maven package?

 

I implemented a prototype for the design and found that the new design is very 
hard to fit into existing flink.ml codebase. The existing flink.ml code is 
tightly coupled with DataSet API. Thus, I have to rewrite almost all parts of 
flink.ml to get some sample case to work. The only reusable code from flink.ml 
are the base math classes under _org.apache.flink.ml.math_ and 
_org.apache.flink.ml.metrics.distance_ packages. 

Considering this fact, I will prefer to create a new package 
org.apache.flink.table.ml and a new maven package flink-table-ml.

 

Please feel free to give your feedbacks. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11095) Table based ML Pipeline

2018-12-07 Thread Weihua Jiang (JIRA)
Weihua Jiang created FLINK-11095:


 Summary: Table based ML Pipeline
 Key: FLINK-11095
 URL: https://issues.apache.org/jira/browse/FLINK-11095
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library, Table API  SQL
Reporter: Weihua Jiang


As Table API will be the unified high level API for both batch and streaming, 
it is desired to have a table API based ML Pipeline definition. 

This new table based ML Pipeline shall:
 # support unified batch/stream ML/AI functionalities (train/inference).  
 # seamless integrated with flink Table based ecosystem. 
 # provide a base for further flink based AI platform/tooling support.
 # provide a base for further flink ML SQL integration. 

The initial design is here 
[https://docs.google.com/document/d/1PLddLEMP_wn4xHwi6069f3vZL7LzkaP0MN9nAB63X90/edit?usp=sharing.]

And based on this design, I made some initial implementation/prototyping. I 
will share the code later.

This is the umbrella JIRA. I will create corresponding sub-jira for each 
sub-task. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8020) Deadlock found in Flink Streaming job

2017-11-09 Thread Weihua Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weihua Jiang updated FLINK-8020:

Attachment: jstack53009(2).out

Attach a new jstack file.

After removing our collector.collect() call out of our lock region, the 
deadlock was gone. However, the system is still NOT able to run as we find that 
flink collector hangs. 

See the attached jstack 53009(2).out file, the thread "cache-process0 -> 
async-operator0 -> Sink: hbase-sink0 (1/10)" holds the lock 0x0007b692ad98 
and is still waiting for THIS exact lock again. This makes it hang. 

Our guess is that this is some issue related to async io. After changing our 
async redis lookup to sync redis lookup (by not using async io), the issue is 
gone.

> Deadlock found in Flink Streaming job
> -
>
> Key: FLINK-8020
> URL: https://issues.apache.org/jira/browse/FLINK-8020
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Streaming, Streaming Connectors
>Affects Versions: 1.3.2
> Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode
>Reporter: Weihua Jiang
>Priority: Blocker
> Attachments: jstack53009(2).out, jstack67976-2.log
>
>
> Our streaming job run into trouble in these days after a long time smooth 
> running. One issue we found is 
> [https://issues.apache.org/jira/browse/FLINK-8019] and another one is this 
> one.
> After analyzing the jstack, we believe  we found a DEAD LOCK in flink:
> 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" 
> hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940.
> 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: 
> hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock 
> 0x0007b6aa1788. 
> This DEADLOCK made the job fail to proceed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-8020) Deadlock found in Flink Streaming job

2017-11-08 Thread Weihua Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243538#comment-16243538
 ] 

Weihua Jiang commented on FLINK-8020:
-

Sorry, after further analyzing, it seems the deadlock is caused by the mixing 
of our lock with flink lock. After reorganizing our code, the deadlock can be 
eliminated. 

> Deadlock found in Flink Streaming job
> -
>
> Key: FLINK-8020
> URL: https://issues.apache.org/jira/browse/FLINK-8020
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Streaming, Streaming Connectors
>Affects Versions: 1.3.2
> Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode
>Reporter: Weihua Jiang
>Priority: Blocker
> Attachments: jstack67976-2.log
>
>
> Our streaming job run into trouble in these days after a long time smooth 
> running. One issue we found is 
> [https://issues.apache.org/jira/browse/FLINK-8019] and another one is this 
> one.
> After analyzing the jstack, we believe  we found a DEAD LOCK in flink:
> 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" 
> hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940.
> 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: 
> hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock 
> 0x0007b6aa1788. 
> This DEADLOCK made the job fail to proceed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-8020) Deadlock found in Flink Streaming job

2017-11-08 Thread Weihua Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weihua Jiang updated FLINK-8020:

Attachment: jstack67976-2.log

> Deadlock found in Flink Streaming job
> -
>
> Key: FLINK-8020
> URL: https://issues.apache.org/jira/browse/FLINK-8020
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Streaming, Streaming Connectors
>Affects Versions: 1.3.2
> Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode
>Reporter: Weihua Jiang
>Priority: Blocker
> Attachments: jstack67976-2.log
>
>
> Our streaming job run into trouble in these days after a long time smooth 
> running. One issue we found is [#FLINK-8019] and another one is this one.
> After analyzing the jstack, we believe  we found a DEAD LOCK in flink:
> 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" 
> hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940.
> 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: 
> hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock 
> 0x0007b6aa1788. 
> This DEADLOCK made the job fail to proceed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8020) Deadlock found in Flink Streaming job

2017-11-08 Thread Weihua Jiang (JIRA)
Weihua Jiang created FLINK-8020:
---

 Summary: Deadlock found in Flink Streaming job
 Key: FLINK-8020
 URL: https://issues.apache.org/jira/browse/FLINK-8020
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector, Streaming, Streaming Connectors
Affects Versions: 1.3.2
 Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode
Reporter: Weihua Jiang
Priority: Blocker
 Attachments: jstack67976-2.log

Our streaming job run into trouble in these days after a long time smooth 
running. One issue we found is [#FLINK-8019] and another one is this one.

After analyzing the jstack, we believe  we found a DEAD LOCK in flink:
1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" 
hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940.
2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: 
hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock 
0x0007b6aa1788. 

This DEADLOCK made the job fail to proceed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-8020) Deadlock found in Flink Streaming job

2017-11-08 Thread Weihua Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weihua Jiang updated FLINK-8020:

Description: 
Our streaming job run into trouble in these days after a long time smooth 
running. One issue we found is 
[https://issues.apache.org/jira/browse/FLINK-8019] and another one is this one.

After analyzing the jstack, we believe  we found a DEAD LOCK in flink:
1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" 
hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940.
2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: 
hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock 
0x0007b6aa1788. 

This DEADLOCK made the job fail to proceed. 

  was:
Our streaming job run into trouble in these days after a long time smooth 
running. One issue we found is [#FLINK-8019] and another one is this one.

After analyzing the jstack, we believe  we found a DEAD LOCK in flink:
1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" 
hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940.
2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: 
hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock 
0x0007b6aa1788. 

This DEADLOCK made the job fail to proceed. 


> Deadlock found in Flink Streaming job
> -
>
> Key: FLINK-8020
> URL: https://issues.apache.org/jira/browse/FLINK-8020
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Streaming, Streaming Connectors
>Affects Versions: 1.3.2
> Environment: Kafka 0.8.2 and Flink 1.3.2 on YARN mode
>Reporter: Weihua Jiang
>Priority: Blocker
> Attachments: jstack67976-2.log
>
>
> Our streaming job run into trouble in these days after a long time smooth 
> running. One issue we found is 
> [https://issues.apache.org/jira/browse/FLINK-8019] and another one is this 
> one.
> After analyzing the jstack, we believe  we found a DEAD LOCK in flink:
> 1. The thread "cache-process0 -> async-operator0 -> Sink: hbase-sink0 (8/8)" 
> hold lock 0x0007b6aa1788 and is waiting for lock 0x0007b6aa1940.
> 2. The thread "Time Trigger for cache-process0 -> async-operator0 -> Sink: 
> hbase-sink0 (8/8)" hold lock 0x0007b6aa1940 and is waiting for lock 
> 0x0007b6aa1788. 
> This DEADLOCK made the job fail to proceed. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (FLINK-8019) Flink streaming job stopped at consuming Kafka data

2017-11-07 Thread Weihua Jiang (JIRA)
Weihua Jiang created FLINK-8019:
---

 Summary: Flink streaming job stopped at consuming Kafka data
 Key: FLINK-8019
 URL: https://issues.apache.org/jira/browse/FLINK-8019
 Project: Flink
  Issue Type: Bug
  Components: Kafka Connector
Affects Versions: 1.3.2
 Environment: We are using Kafka 0.8.2.1 and Flink 1.3.2 on YARN mode.
Reporter: Weihua Jiang


Our flink streaming job consumes data from Kafka and it worked well for a long 
time. However, these days we encountered several times that it stopped 
consuming data from Kafka. 

The jstack shows that it stopped at LocalBufferPool.requestBuffer. The jstack 
file is attached. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (FLINK-8019) Flink streaming job stopped at consuming Kafka data

2017-11-07 Thread Weihua Jiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weihua Jiang updated FLINK-8019:

Attachment: jstack-20171108-2.log

> Flink streaming job stopped at consuming Kafka data
> ---
>
> Key: FLINK-8019
> URL: https://issues.apache.org/jira/browse/FLINK-8019
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector
>Affects Versions: 1.3.2
> Environment: We are using Kafka 0.8.2.1 and Flink 1.3.2 on YARN mode.
>Reporter: Weihua Jiang
> Attachments: jstack-20171108-2.log
>
>
> Our flink streaming job consumes data from Kafka and it worked well for a 
> long time. However, these days we encountered several times that it stopped 
> consuming data from Kafka. 
> The jstack shows that it stopped at LocalBufferPool.requestBuffer. The jstack 
> file is attached. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)