Hi there, I am not a machine learning expert :) But in recent, I see more and more adoption and trends towards tensor flow[1], which is backed by google and big vendors.
If flink and somehow compatible and run tensor flow pipelines (with some modifications is fine) I think the adoption would be faster. Thanks, Chen [1] https://github.com/tensorflow/tensorflow On Fri, Mar 17, 2017 at 7:44 AM, Theodore Vasiloudis < theodoros.vasilou...@gmail.com> wrote: > > > > What should be the way of work here? We could have sketches for the > > separate projects in Gdocs, then the shepherds could make a proposal out > of > > it. Would that be feasible? > > > That's what I was thinking as well. It's the responsibility of the shepherd > to engage the people motivated to work > on a project, starting with a rough Gdocs document and gradually transition > it to a proper design doc. > > As an example use-case (for both online and "fast-batch") I would recommend > an ad click scenario: Predicting CTR. > > The are multiple reasons I like this application: > > - it's a very popular application > - it's directly tied to revenue so even small improvements are relevant, > - it can often be a very large-scale problem in data and model size, > - there are good systems out there already to benchmark against, like > Vowpal Wabbit. > - At least one one large-scale dataset exists [1], > - We could even place a pre-processing pipeline to emulate a real > application, and show the full benefits of using Flink as your > one-stop-shop for an integrated prediction pipeline (up until model > serving > for now). > > We are still missing someone to take the lead on the model serving project, > if somebody would be interested to > coordinate that let us know. > > Regards, > Theodore > > [1] Criteo click-through data (1TB): > http://www.criteo.com/news/press-releases/2015/06/criteo- > releases-industrys-largest-ever-dataset/ > > On Thu, Mar 16, 2017 at 11:50 PM, Gábor Hermann <m...@gaborhermann.com> > wrote: > > > @Theodore: thanks for bringing the discussion together. > > I think it's reasonable to go on all the three directions, just as you > > suggested. I agree we should concentrate our efforts, but we can do a > > low-effort evaluation of all the three. > > > > I would like to volunteer for shepherding *Offline learning on > Streaming*. > > I am already working on related issues, and I believe I have a fairly > good > > overview on the streaming API and its limitations. However, we need to > find > > a good use-case to aim for, and I don't have one in mind yet, so please > > help with that if you can. I absolutely agree with Theodore, that setting > > the scope is the most important here. > > > > We should find a simple use-case for incremental learning. As Flink is > > really strong in low-latency data processing, the best would be a > use-case > > where rapidly adapting the model to new data provides a value. We should > > also consider low-latency serving for such a use-case, as there is not > much > > use in fast model updates if we cannot serve the predictions that fast. > Of > > course, it's okay to simply implement offline algorithms, but showcasing > > would be easier if we could add prediction serving for the model in the > > same system. > > > > What should be the way of work here? We could have sketches for the > > separate projects in Gdocs, then the shepherds could make a proposal out > of > > it. Would that be feasible? > > > > @Stephan: > > Thanks for your all insights. I also like the approach of aiming for new > > and somewhat unexplored areas. I guess we can do that with both the > > serving/evaluation and incremental training (that should be in scope of > the > > offline ML on streaming). > > > > I agree GPU acceleration is an important issue, however it might be > > out-of-scope for the prototypes of these new ML directions. What do you > > think? > > > > Regarding your comments on the other thread, I'm really glad PMC is > > working towards growing the community. This is crucial to have anything > > merged in Flink while keeping the code quality. However, for the > > prototypes, I'd prefer Theodore's suggestion, to do it in a separate > > repository, to make initial development faster. After the prototypes have > > proven their usability we could merge them, and continue working on them > > inside the Flink repository. But we can decide that later. > > > > Cheers, > > Gabor > > > > > > > > On 2017-03-14 21:04, Stephan Ewen wrote: > > > >> Thanks Theo. Just wrote some comments on the other thread, but it looks > >> like you got it covered already. > >> > >> Let me re-post what I think may help as input: > >> > >> *Concerning Model Evaluation / Serving * > >> > >> - My personal take is that the "model evaluation" over streams will > be > >> happening in any case - there > >> is genuine interest in that and various users have built that > >> themselves already. > >> I would be a cool way to do something that has a very high chance > of > >> being productionized by users soon. > >> > >> - The model evaluation as one step of a streaming pipeline > >> (classifying > >> events), followed by CEP (pattern detection) > >> or anomaly detection is a valuable use case on top of what pure > >> model > >> serving systems usually do. > >> > >> - A question I have not yet a good intuition on is whether the > "model > >> evaluation" and the training part are so > >> different (one a good abstraction for model evaluation has been > >> built) > >> that there is little cross coordination needed, > >> or whether there is potential in integrating them. > >> > >> > >> *Thoughts on the ML training library (DataSet API or DataStream API)* > >> > >> - I honestly don't quite understand what the big difference will be > in > >> targeting the batch or streaming API. You can use the > >> DataSet API in a quite low-level fashion (missing async > iterations). > >> > >> - There seems especially now to be a big trend towards deep learning > >> (is > >> it just temporary or will this be the future?) and in > >> that space, little works without GPU acceleration. > >> > >> - It is always easier to do something new than to be the n-th version > >> of > >> something existing (sorry for the generic true-ism). > >> The later admittedly gives the "all in one integrated framework" > >> advantage (which can be a very strong argument indeed), > >> but the former attracts completely new communities and can often > make > >> more impact with less effort. > >> > >> - The "new" is not required to be "online learning", where Theo has > >> described some concerns well. > >> It can also be traditional ML re-imagined for "continuous > >> applications", as "continuous / incremental re-training" or so. > >> Even on the "model evaluation side", there is a lot of interesting > >> stuff as mentioned already, like ensembles, multi-armed bandits, ... > >> > >> - It may be well worth tapping into the work of an existing library > >> (like > >> tensorflow) for an easy fix to some hard problems (pre-existing > >> hardware integration, pre-existing optimized linear algebra > solvers, > >> etc) and think about how such use cases would look like in > >> the context of typical Flink applications. > >> > >> > >> *A bit of engine background information that may help in the planning:* > >> > >> - The DataStream API will in the future also support bounded data > >> computations explicitly (I say this not as a fact, but as > >> a strong believer that this is the right direction). > >> > >> - Batch runtime execution has seen less focus recently, but seems to > >> get > >> a bit more community focus, because some organizations > >> that contribute a lot want to use the batch side as well. For > example > >> the effort on file-grained recovery will strengthen batch a lot already. > >> > >> > >> Stephan > >> > >> > >> > >> On Tue, Mar 14, 2017 at 1:38 PM, Theodore Vasiloudis < > >> theodoros.vasilou...@gmail.com> wrote: > >> > >> Hello all, > >>> > >>> ## Executive summary: > >>> > >>> - Offline-on-streaming most popular, then online and model serving. > >>> - Need shepherds to lead development/coordination of each task. > >>> - I can shepherd online learning, need shepherds for the other two. > >>> > >>> > >>> so from the people sharing their opinion it seems most people would > like > >>> to > >>> try out offline learning with the streaming API. > >>> I also think this is an interesting option, but probably the most risky > >>> of > >>> the bunch. > >>> > >>> After that online learning and model serving seem to have around the > same > >>> amount of interest. > >>> > >>> Given that, and the discussions we had in the Gdoc, here's what I > >>> recommend > >>> as next actions: > >>> > >>> - > >>> *Offline on streaming: *Start by creating a design document, with an > MVP > >>> specification about what we > >>> imagine such a library to look like and what we think should be > >>> possible > >>> to do. > >>> It should state clear goals and limitations; scoping the amount of > >>> work > >>> is > >>> more important at this point than specific engineering choices. > >>> - > >>> *Online learning: *If someone would like instead to work on online > >>> learning > >>> I can help out there, > >>> I have one student working on such a library right now, and I'm > sure > >>> people > >>> at TU Berlin (Felix?) have similar efforts. Ideally we would like > to > >>> communicate with > >>> them. Since this is a much more explored space, we could jump > >>> straight > >>> into a technical > >>> design document, (with scoping included of course) discussing > >>> abstractions, and comparing > >>> with existing frameworks. > >>> - > >>> *Model serving: *There will be a presentation at Flink Forward SF on > >>> such a > >>> framework (Flink Tensorflow) > >>> by Eron Wright [1]. My recommendation would be to communicate with > >>> the > >>> author and see > >>> if he would be interested in working together to generalize and > >>> extend > >>> the framework. > >>> For more research and resources on the topic see [2] or this > >>> presentation [3], particularly the Clipper system. > >>> > >>> In order to have some activity on each project I recommend we set a > >>> minimum > >>> of 2 people willing to > >>> contribute to each project. > >>> > >>> If we "assign" people by top choice, that should be possible to do, > >>> although my original plan was > >>> to only work on two of the above, to avoid fragmentation. But given > that > >>> online learning will have work > >>> being done by students as well, it should be possible to keep it > running. > >>> > >>> Next *I would like us to assign a "shepherd" for each of these tasks.* > If > >>> you are willing to coordinate the development > >>> on one of these options, let us know here and you can take up the task > of > >>> coordinating with the rest of > >>> of the people working on the task. > >>> > >>> I would like to volunteer to coordinate the *Online learning *effort, > >>> since > >>> I'm already supervising a student > >>> working on this, and I'm currently developing such algorithms. I plan > to > >>> contribute to the offline on streaming > >>> task as well, but not coordinate it. > >>> > >>> So if someone would like to take the lead on Offline on streaming or > >>> Model > >>> serving, let us know and > >>> we can take it from there. > >>> > >>> Regards, > >>> Theodore > >>> > >>> [1] http://sf.flink-forward.org/kb_sessions/introducing-flink-te > >>> nsorflow/ > >>> > >>> [2] https://ucbrise.github.io/cs294-rise-fa16/prediction_serving.html > >>> > >>> [3] > >>> https://ucbrise.github.io/cs294-rise-fa16/assets/slides/ > >>> prediction-serving-systems-cs294-RISE_seminar.pdf > >>> > >>> On Fri, Mar 10, 2017 at 6:55 PM, Stavros Kontopoulos < > >>> st.kontopou...@gmail.com> wrote: > >>> > >>> Thanks Theodore, > >>>> > >>>> I'd vote for > >>>> > >>>> - Offline learning with Streaming API > >>>> > >>>> - Low-latency prediction serving > >>>> > >>>> Some comments... > >>>> > >>>> Online learning > >>>> > >>>> Good to have but my feeling is that it is not a strong requirement > (if a > >>>> requirement at all) across the industry right now. May become hot in > the > >>>> future. > >>>> > >>>> Offline learning with Streaming API: > >>>> > >>>> Although it requires engine changes or extensions (feasibility is an > >>>> > >>> issue > >>> > >>>> here), my understanding is that it reflects the industry common > practice > >>>> (train every few minutes at most) and it would be great if that was > >>>> supported out of the box providing a friendly API for the developer. > >>>> > >>>> Offline learning with the batch API: > >>>> > >>>> I would love to have a limited set of algorithms so someone does not > >>>> > >>> leave > >>> > >>>> Flink to work with another tool > >>>> for some initial dataset if he wants to. In other words, let's reach a > >>>> mature state with some basic algos merged. > >>>> There is a lot of work pending let's not waste it. > >>>> > >>>> Low-latency prediction serving > >>>> > >>>> Model serving is a long standing problem, we could definitely help > with > >>>> that. > >>>> > >>>> Regards, > >>>> Stavros > >>>> > >>>> > >>>> > >>>> On Fri, Mar 10, 2017 at 4:08 PM, Till Rohrmann <trohrm...@apache.org> > >>>> wrote: > >>>> > >>>> Thanks Theo for steering Flink's ML effort here :-) > >>>>> > >>>>> I'd vote to concentrate on > >>>>> > >>>>> - Online learning > >>>>> - Low-latency prediction serving > >>>>> > >>>>> because of the following reasons: > >>>>> > >>>>> Online learning: > >>>>> > >>>>> I agree that this topic is highly researchy and it's not even clear > >>>>> > >>>> whether > >>>> > >>>>> it will ever be of any interest outside of academia. However, it was > >>>>> > >>>> the > >>> > >>>> same for other things as well. Adoption in industry is usually slow > and > >>>>> sometimes one has to dare to explore something new. > >>>>> > >>>>> Low-latency prediction serving: > >>>>> > >>>>> Flink with its streaming engine seems to be the natural fit for such > a > >>>>> > >>>> task > >>>> > >>>>> and it is a rather low hanging fruit. Furthermore, I think that users > >>>>> > >>>> would > >>>> > >>>>> directly benefit from such a feature. > >>>>> > >>>>> Offline learning with Streaming API: > >>>>> > >>>>> I'm not fully convinced yet that the streaming API is powerful enough > >>>>> (mainly due to lack of proper iteration support and spilling > >>>>> > >>>> capabilities) > >>>> > >>>>> to support a wide range of offline ML algorithms. And if then it will > >>>>> > >>>> only > >>>> > >>>>> support rather small problem sizes because streaming cannot > gracefully > >>>>> spill the data to disk. There are still to many open issues with the > >>>>> streaming API to be applicable for this use case imo. > >>>>> > >>>>> Offline learning with the batch API: > >>>>> > >>>>> For offline learning the batch API is imo still better suited than > the > >>>>> streaming API. I think it will only make sense to port the algorithms > >>>>> > >>>> to > >>> > >>>> the streaming API once batch and streaming are properly unified. Alone > >>>>> > >>>> the > >>>> > >>>>> highly efficient implementations for joining and sorting of data > which > >>>>> > >>>> can > >>>> > >>>>> go out of memory are important to support big sized ML problems. In > >>>>> general, I think it might make sense to offer a basic set of ML > >>>>> > >>>> primitives. > >>>> > >>>>> However, already offering this basic set is a considerable amount of > >>>>> > >>>> work. > >>>> > >>>>> Concering the independent organization for the development: I think > it > >>>>> would be great if the development could still happen under the > umbrella > >>>>> > >>>> of > >>>> > >>>>> Flink's ML library because otherwise we might risk some kind of > >>>>> fragmentation. In order for people to collaborate, one can also open > >>>>> > >>>> PRs > >>> > >>>> against a branch of a forked repo. > >>>>> > >>>>> I'm currently working on wrapping the project re-organization > >>>>> > >>>> discussion > >>> > >>>> up. The general position was that it would be best to have an > >>>>> > >>>> incremental > >>> > >>>> build and keep everything in the same repo. If this is not possible > >>>>> > >>>> then > >>> > >>>> we > >>>> > >>>>> want to look into creating a sub repository for the libraries (maybe > >>>>> > >>>> other > >>>> > >>>>> components will follow later). I hope to make some progress on this > >>>>> > >>>> front > >>> > >>>> in the next couple of days/week. I'll keep you updated. > >>>>> > >>>>> As a general remark for the discussions on the google doc. I think it > >>>>> > >>>> would > >>>> > >>>>> be great if we could at least mirror the discussions happening in the > >>>>> google doc back on the mailing list or ideally conduct the > discussions > >>>>> directly on the mailing list. That's at least what the ASF > encourages. > >>>>> > >>>>> Cheers, > >>>>> Till > >>>>> > >>>>> On Fri, Mar 10, 2017 at 10:52 AM, Gábor Hermann < > m...@gaborhermann.com > >>>>> wrote: > >>>>> > >>>>> Hey all, > >>>>>> > >>>>>> Sorry for the bit late response. > >>>>>> > >>>>>> I'd like to work on > >>>>>> - Offline learning with Streaming API > >>>>>> - Low-latency prediction serving > >>>>>> > >>>>>> I would drop the batch API ML because of past experience with lack > of > >>>>>> support, and online learning because the lack of use-cases. > >>>>>> > >>>>>> I completely agree with Kate that offline learning should be > >>>>>> > >>>>> supported, > >>> > >>>> but given Flink's resources I prefer using the streaming API as > >>>>>> > >>>>> Roberto > >>> > >>>> suggested. Also, full model lifecycle (or end-to-end ML) could be > >>>>>> > >>>>> more > >>> > >>>> easily supported in one system (one API). Connecting Flink Batch with > >>>>>> > >>>>> Flink > >>>>> > >>>>>> Streaming is currently cumbersome (although side inputs [1] might > >>>>>> > >>>>> help). > >>>> > >>>>> In > >>>>> > >>>>>> my opinion, a crucial part of end-to-end ML is low-latency > >>>>>> > >>>>> predictions. > >>> > >>>> As another direction, we could integrate Flink Streaming API with > >>>>>> > >>>>> other > >>> > >>>> projects (such as Prediction IO). However, I believe it's better to > >>>>>> > >>>>> first > >>>> > >>>>> evaluate the capabilities and drawbacks of the streaming API with > >>>>>> > >>>>> some > >>> > >>>> prototype of using Flink Streaming for some ML task. Otherwise we > >>>>>> > >>>>> could > >>> > >>>> run > >>>>> > >>>>>> into critical issues just as the System ML integration with e.g. > >>>>>> > >>>>> caching. > >>>> > >>>>> These issues makes the integration of Batch API with other ML > >>>>>> > >>>>> projects > >>> > >>>> practically infeasible. > >>>>>> > >>>>>> I've already been experimenting with offline learning with the > >>>>>> > >>>>> Streaming > >>>> > >>>>> API. Hopefully, I can share some initial performance results next > >>>>>> > >>>>> week > >>> > >>>> on > >>>> > >>>>> matrix factorization. Naturally, I've run into issues. E.g. I could > >>>>>> > >>>>> only > >>>> > >>>>> mark the end of input with some hacks, because this is not needed at > >>>>>> > >>>>> a > >>> > >>>> streaming job consuming input forever. AFAIK, this would be resolved > >>>>>> > >>>>> by > >>> > >>>> side inputs [1]. > >>>>>> > >>>>>> @Theodore: > >>>>>> +1 for doing the prototype project(s) separately the main Flink > >>>>>> repository. Although, I would strongly suggest to follow Flink > >>>>>> > >>>>> development > >>>>> > >>>>>> guidelines as closely as possible. As another note, there is already > >>>>>> > >>>>> a > >>> > >>>> GitHub organization for Flink related projects [2], but it seems like > >>>>>> > >>>>> it > >>>> > >>>>> has not been used much. > >>>>>> > >>>>>> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+ > >>>>>> Side+Inputs+for+DataStream+API > >>>>>> [2] https://github.com/project-flink > >>>>>> > >>>>>> > >>>>>> On 2017-03-04 08:44, Roberto Bentivoglio wrote: > >>>>>> > >>>>>> Hi All, > >>>>>> > >>>>>>> I'd like to start working on: > >>>>>>> - Offline learning with Streaming API > >>>>>>> - Online learning > >>>>>>> > >>>>>>> I think also that using a new organisation on github, as Theodore > >>>>>>> > >>>>>> propsed, > >>>>> > >>>>>> to keep an initial indipendency to speed up the prototyping and > >>>>>>> development > >>>>>>> phases it's really interesting. > >>>>>>> > >>>>>>> I totally agree with Katherin, we need offline learning, but my > >>>>>>> > >>>>>> opinion > >>>> > >>>>> is > >>>>> > >>>>>> that it will be more straightforward to fix the streaming issues > >>>>>>> > >>>>>> than > >>> > >>>> batch > >>>>>>> issues because we will have more support on that by the Flink > >>>>>>> > >>>>>> community. > >>>> > >>>>> Thanks and have a nice weekend, > >>>>>>> Roberto > >>>>>>> > >>>>>>> On 3 March 2017 at 20:20, amir bahmanyari > >>>>>>> > >>>>>> <amirto...@yahoo.com.invalid > >>> > >>>> wrote: > >>>>>>> > >>>>>>> Great points to start: - Online learning > >>>>>>> > >>>>>>>> - Offline learning with the streaming API > >>>>>>>> > >>>>>>>> Thanks + have a great weekend. > >>>>>>>> > >>>>>>>> From: Katherin Eri <katherinm...@gmail.com> > >>>>>>>> To: dev@flink.apache.org > >>>>>>>> Sent: Friday, March 3, 2017 7:41 AM > >>>>>>>> Subject: Re: Machine Learning on Flink - Next steps > >>>>>>>> > >>>>>>>> Thank you, Theodore. > >>>>>>>> > >>>>>>>> Shortly speaking I vote for: > >>>>>>>> 1) Online learning > >>>>>>>> 2) Low-latency prediction serving -> Offline learning with the > >>>>>>>> > >>>>>>> batch > >>> > >>>> API > >>>>> > >>>>>> In details: > >>>>>>>> 1) If streaming is strong side of Flink lets use it, and try to > >>>>>>>> > >>>>>>> support > >>>> > >>>>> some online learning or light weight inmemory learning algorithms. > >>>>>>>> > >>>>>>> Try > >>>> > >>>>> to > >>>>> > >>>>>> build pipeline for them. > >>>>>>>> > >>>>>>>> 2) I think that Flink should be part of production ecosystem, and > >>>>>>>> > >>>>>>> if > >>> > >>>> now > >>>>> > >>>>>> productions require ML support, multiple models deployment and so > >>>>>>>> > >>>>>>> on, > >>> > >>>> we > >>>>> > >>>>>> should serve this. But in my opinion we shouldn’t compete with such > >>>>>>>> projects like PredictionIO, but serve them, to be an execution > >>>>>>>> > >>>>>>> core. > >>> > >>>> But > >>>>> > >>>>>> that means a lot: > >>>>>>>> > >>>>>>>> a. Offline training should be supported, because typically most of > >>>>>>>> > >>>>>>> ML > >>> > >>>> algs > >>>>>>>> are for offline training. > >>>>>>>> b. Model lifecycle should be supported: > >>>>>>>> ETL+transformation+training+scoring+exploitation quality > >>>>>>>> > >>>>>>> monitoring > >>> > >>>> I understand that batch world is full of competitors, but for me > >>>>>>>> > >>>>>>> that > >>> > >>>> doesn’t mean that batch should be ignored. I think that separated > >>>>>>>> streaming/batching applications causes additional deployment and > >>>>>>>> exploitation overhead which typically tried to be avoided. That > >>>>>>>> > >>>>>>> means > >>> > >>>> that > >>>>>>>> we should attract community to this problem in my opinion. > >>>>>>>> > >>>>>>>> > >>>>>>>> пт, 3 мар. 2017 г. в 15:34, Theodore Vasiloudis < > >>>>>>>> theodoros.vasilou...@gmail.com>: > >>>>>>>> > >>>>>>>> Hello all, > >>>>>>>> > >>>>>>>> From our previous discussion started by Stavros, we decided to > >>>>>>>> > >>>>>>> start a > >>>> > >>>>> planning document [1] > >>>>>>>> to figure out possible next steps for ML on Flink. > >>>>>>>> > >>>>>>>> Our concerns where mainly ensuring active development while > >>>>>>>> > >>>>>>> satisfying > >>>> > >>>>> the > >>>>>>>> needs of > >>>>>>>> the community. > >>>>>>>> > >>>>>>>> We have listed a number of proposals for future work in the > >>>>>>>> > >>>>>>> document. > >>> > >>>> In > >>>>> > >>>>>> short they are: > >>>>>>>> > >>>>>>>> - Offline learning with the batch API > >>>>>>>> - Online learning > >>>>>>>> - Offline learning with the streaming API > >>>>>>>> - Low-latency prediction serving > >>>>>>>> > >>>>>>>> I saw there is a number of people willing to work on ML for Flink, > >>>>>>>> > >>>>>>> but > >>>> > >>>>> the > >>>>>>>> truth is that we cannot > >>>>>>>> cover all of these suggestions without fragmenting the development > >>>>>>>> > >>>>>>> too > >>>> > >>>>> much. > >>>>>>>> > >>>>>>>> So my recommendation is to pick out 2 of these options, create > >>>>>>>> > >>>>>>> design > >>> > >>>> documents and build prototypes for each library. > >>>>>>>> We can then assess their viability and together with the community > >>>>>>>> > >>>>>>> decide > >>>>> > >>>>>> if we should try > >>>>>>>> to include one (or both) of them in the main Flink distribution. > >>>>>>>> > >>>>>>>> So I invite people to express their opinion about which task they > >>>>>>>> > >>>>>>> would > >>>> > >>>>> be > >>>>>>>> willing to contribute > >>>>>>>> and hopefully we can settle on two of these options. > >>>>>>>> > >>>>>>>> Once that is done we can decide how we do the actual work. Since > >>>>>>>> > >>>>>>> this > >>> > >>>> is > >>>>> > >>>>>> highly experimental > >>>>>>>> I would suggest we work on repositories where we have complete > >>>>>>>> > >>>>>>> control. > >>>> > >>>>> For that purpose I have created an organization [2] on Github which > >>>>>>>> > >>>>>>> we > >>>> > >>>>> can > >>>>>>>> use to create repositories and teams that work on them in an > >>>>>>>> > >>>>>>> organized > >>>> > >>>>> manner. > >>>>>>>> Once enough work has accumulated we can start discussing > >>>>>>>> > >>>>>>> contributing > >>> > >>>> the > >>>>> > >>>>>> code > >>>>>>>> to the main distribution. > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Theodore > >>>>>>>> > >>>>>>>> [1] > >>>>>>>> https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3U > >>>>>>>> d06MIRhahtJ6dw/ > >>>>>>>> [2] https://github.com/flinkml > >>>>>>>> > >>>>>>>> -- > >>>>>>>> > >>>>>>>> *Yours faithfully, * > >>>>>>>> > >>>>>>>> *Kate Eri.* > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > > >