Thanks for your feedback! This is very valuable :) Please share your experience (positive and negative) when doing more complex stuff. And don't hesitate to ask if you have any questions.
-Matthias On 11/21/2015 06:04 PM, Naveen Madhire wrote: > FYI, I just saw this email chain and thought of sharing my exp. I used the > Storm Flink API few days ago. Just a simple example worked well, however I > will be testing few more next week. > > One thing to note is, I had to include all Scala dependencies in the storm > topology since FlinkLocalCluster.java class has LocalFlinkMiniCluster.scala > > > Not sure if this is an issue but after including scala dependencies > everything worked well. ;) > > > On Fri, Nov 20, 2015 at 4:12 PM, Matthias J. Sax <[email protected]> wrote: > >> Multiple inputs per bolt is currently not supported. :( >> FlinkTopologyBuilder has a bug. There is already a JIRA for it: >> https://issues.apache.org/jira/browse/FLINK-2837 >> >> I know already how to fix it (hope to can get it into 0.10.1) >> >> Removing FlinkTopologyBuilder does make sense (I did not do it because >> the members we need to access are private). Your idea to get access via >> reflection is good! >> >> Btw: can you also have a look here: >> https://github.com/apache/flink/pull/1387 >> I would like to merge this ASAP but need some feedback. >> >> -Matthias >> >> On 11/20/2015 07:30 PM, Maximilian Michels wrote: >>> I thought about the API changes again. It probably does make sense to >>> keep the LocalCluster and StormSubmitter equivalent classes. That way, >>> we don't break the Storm API too much. Users can stick to the pattern >>> of using either FlinkCluster to execute locally or FlinkSubmitter to >>> submit remotely. Still, we can save some code by reusing Storm's >>> TopologyBuilder. >>> >>> I'll open a pull request with the changes. This also includes some >>> more examples and features (e.g. multiple inputs per Bolt). >>> >>> On Mon, Nov 16, 2015 at 4:33 PM, Maximilian Michels <[email protected]> >> wrote: >>>> You are right in saying that both API approaches support executing >>>> Storm jobs. However, I think the proposed changes make it much easier >>>> to reuse Storm topologies. And here is why: >>>> >>>> 1. No existing classes need to be exchanged. >>>> >>>> A Storm topology stays like it is. If you already have it defined >>>> somewhere, you simply pass it to the FlinkTopologyBuilder to create a >>>> StreamExecutionEnvironment. >>>> >>>> 2. Storm and Flink have different runtime behavior. >>>> >>>> IMHO makes more sense to make it transparent to the user that the >>>> result of the translation is an actual Flink job executed by the Flink >>>> runtime. Therefore, it makes sense to stick to the Flink way of >>>> executing. Hiding this fact behind Storm dummy classes can create >>>> problems for the user. >>>> >>>> 3. Code reuse >>>> >>>> As you can see in the proposed changes, it makes the implementation >>>> much simpler while retaining the desire functionality. That has also >>>> impact of testability and maintainability. >>>> >>>> I can also understand your perspective. I wonder if we could get some >>>> feedback from other people on the mailing list? >>>> >>>> >>>> Let me also address your other comments and suggestions: >>>> >>>>> * You changed examples to use finite-spouts -- from a testing point of >>>>> view this makes sense. However, the examples should show how to run an >>>>> *unmodified* Storm topology in Flink. >>>> >>>> Good point. As far as I know we only test finite sources in the Flink >>>> streaming tests. Using finite sources makes things much easier. I >>>> would like to keep the tests simple like this. We can still have >>>> separate tests to test the infinite attribute of the regular spouts. >>>> The examples can be converted back to using the infinite spout. IMHO >>>> the existing approach which involves waiting and killing of the >>>> topology doesn't seem to be the cleanest solution. >>>> >>>>> * we should keep the local copy "unprocessedBolts" when creating a >> Flink >>>>> program to allow to re-submit the same topology object twice (or alter >>>>> it after submission). If you don't make the copy, >> submitting/translating >>>>> the topology into a Flink job alters the object (which should not >>>>> happen). And as it is not performance critical, the copying overhead >>>>> does not matter. >>>> >>>> I didn't think about that but we can copy the spouts and bolts before >>>> processing them. I've added that to my local branch. However, I didn't >>>> see where this was done previously. Can you give me a hint? >>>> >>>>> * Why did you change the dop from 4 to 1 WordCountTopology ? We should >>>>> test in parallel fashion... >>>> >>>> Absolutely. Already reverted this locally. >>>> >>>>> * Too many reformatting changes ;) You though many classes without any >>>>> actual code changes. >>>> >>>> Yes, I ran "Optimize Imports" in IntelliJ. Sorry for that but this >>>> only affects the import statements. >>>> >>>> I would like to open a pull request soon to merge some of the changes. >>>> It would be great if some other people commented on the API changes >>>> and whether we should integrate direct support for spouts/bolts in >>>> DataStream. Next, I would like to test and bundle some more of the >>>> examples included in Storm. >>>> >>>> Cheers, >>>> Max >>>> >>>> On Sat, Nov 14, 2015 at 5:13 PM, Matthias J. Sax <[email protected]> >> wrote: >>>>> I just had a look at your proposal. It makes a lot of sense. I still >>>>> believe that it is a matter of taste if one prefers your or my point of >>>>> view. Both approaches allows to easily reuse and execute Storm >>>>> Topologies on Flink (what is the most important feature we need to >> have). >>>>> >>>>> I hope to get some more feedback from the community, if the >>>>> Strom-compatibility should be more "stormy" or more "flinky". Bot >>>>> approaches make sense to me. >>>>> >>>>> >>>>> I view minor comments: >>>>> >>>>> * FileSpout vs FiniteFileSpout >>>>> -> FileSpout was implemented in a Storm way -- to set the "finished" >>>>> flag here does not make sense from a Storm point of view (there is no >>>>> such thing as a finite spout) >>>>> Thus, this example shows how a regular Storm spout can be improved >>>>> using FiniteSpout interface -- I would keep it as is (even if seems to >>>>> be unnecessary complicated -- imagine that you don't have the code of >>>>> FileSpout) >>>>> >>>>> * You changed examples to use finite-spouts -- from a testing point of >>>>> view this makes sense. However, the examples should show how to run an >>>>> *unmodified* Storm topology in Flink. >>>>> >>>>> * we should keep the local copy "unprocessedBolts" when creating a >> Flink >>>>> program to allow to re-submit the same topology object twice (or alter >>>>> it after submission). If you don't make the copy, >> submitting/translating >>>>> the topology into a Flink job alters the object (which should not >>>>> happen). And as it is not performance critical, the copying overhead >>>>> does not matter. >>>>> >>>>> * Why did you change the dop from 4 to 1 WordCountTopology ? We should >>>>> test in parallel fashion... >>>>> >>>>> * Too many reformatting changes ;) You though many classes without any >>>>> actual code changes. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> -------- Forwarded Message -------- >>>>> Subject: Re: Storm Compatibility >>>>> Date: Fri, 13 Nov 2015 12:15:19 +0100 >>>>> From: Maximilian Michels <[email protected]> >>>>> To: Matthias J. Sax <[email protected]> >>>>> CC: Stephan Ewen <[email protected]>, Robert Metzger < >> [email protected]> >>>>> >>>>> Hi Matthias, >>>>> >>>>> Thank you for your remarks. >>>>> >>>>> I believe the goal of the compatibility layer should not be to mimic >>>>> Storm's API but to easily execute Storm typologies using Flink. I see >>>>> that it is easy for users to use class names for execution they know >>>>> from Storm but I think this makes the API verbose. I've refactored it >>>>> a bit to make it more aligned with Flink's execution model. After all, >>>>> the most important thing is that it makes it easy for people to reuse >>>>> Storm typologies while getting all the advantages of Flink. >>>>> >>>>> Let me explain what I have done so far: >>>>> https://github.com/apache/flink/compare/master...mxm:storm-dev >>>>> >>>>> API >>>>> - remove FlinkClient, FlinkSubmitter, FlinkLocalCluster, >>>>> FlinkTopology: They are not necessary in my opinion and are >>>>> replicating functionality already included in Flink or Storm. >>>>> >>>>> - Build the topology with the Storm TopologyBuilder (instead of >>>>> FlinkTopology) which is then passed to the FlinkTopologyBuilder which >>>>> generates the StreamExecutionEnvironment containing the StreamGraph. >>>>> You can then simply call execute() like you would usually do in Flink. >>>>> This lets you reuse your Storm typologies with the ease of Flink >>>>> context-based execution mechanism. Note that it works in local and >>>>> remote execution mode without changing any code. >>>>> >>>>> Tests >>>>> - replaced StormTestBase.java with StreamingTestBase >>>>> - use a Finite source for the tests and changed it a bit >>>>> >>>>> Examples >>>>> - Convert examples to new API >>>>> - Remove duplicate examples (local and remote) >>>>> >>>>> I hope these changes are not too invasive for you. I think it makes >>>>> the compatibility layer much easier to use. Let me know what you think >>>>> about it. Of course, we can iterate on it. >>>>> >>>>> About the integration of the compatibility layer into DataStream: >>>>> Wouldn't it be possible to set storm to provided and let the user >>>>> include the jar if he/she wants to use the Storm compatibility? That's >>>>> also what we do for other libraries like Gelly. You have to package >>>>> them into the JAR if you want to run them on the cluster. We should >>>>> give a good error message if classes cannot be found. >>>>> >>>>> +1 for moving the discussion to the dev list. >>>>> >>>>> Cheers, >>>>> Max >>>>> >>>>> On Fri, Nov 13, 2015 at 7:41 AM, Matthias J. Sax <[email protected]> >> wrote: >>>>>> One more thing that just came to my mind about (1): I have to correct >> my >>>>>> last reply on it: >>>>>> >>>>>> We **cannot reuse** TopologyBuilder because the returned StormTopology >>>>>> from .createTopology() does **not** contain the references to the >>>>>> Spout/Bolt object. Internally, those are already serialized into an >>>>>> internal Thrift representation (as preparation to get sent to Nimbus). >>>>>> However, in order to create a Flink job, we need the references of >> course... >>>>>> >>>>>> -Matthias >>>>>> >>>>>> >>>>>> On 11/11/2015 04:33 PM, Maximilian Michels wrote: >>>>>>> Hi Matthias, >>>>>>> >>>>>>> Sorry for getting back to you late. I'm very new to Storm but have >>>>>>> familiarized myself a bit the last days. While looking through the >>>>>>> Storm examples and the compatibility layer I discovered the following >>>>>>> issues: >>>>>>> >>>>>>> 1) The compatibility layer mirrors the Storm API instead of reusing >>>>>>> it. Why do we need a FlinkTopologyBuilder, FlinkCluster, >>>>>>> FlinkSubmitter, FlinkClient? Couldn't all these user-facing classes >> by >>>>>>> replaced by e.g. StormExecutionEnvironment which receives the Storm >>>>>>> topology and upon getStreamGraph() just traverses it? >>>>>>> >>>>>>> 2) DRPC is not yet supported. I don't know how crucial this is but it >>>>>>> seems to be widespread Storm feature. If we wrapped the entire Storm >>>>>>> topology, we could give appropriate errors when we see such >>>>>>> unsupported features. >>>>>>> >>>>>>> 3) We could simplify embedding Spouts and Bolts directly as operator >>>>>>> functions. Users shouldn't have to worry about extracting the types. >>>>>>> Perhaps we could implement a dedicated method to add spouts/bolts on >>>>>>> DataStream? >>>>>>> >>>>>>> 5) Performance: The BoltWrapper creates a StormTuple for every >>>>>>> incoming record. I think this could be improved. Couldn't we use the >>>>>>> StormTuple as data type instead of Flink's tuples? >>>>>>> >>>>>>> 6) Trident Examples. Have you run any? >>>>>>> >>>>>>> That's it for now. I'm sure you know about many more improvements or >>>>>>> problems because you're the expert on this. In the meantime, I'll try >>>>>>> to contact you via IRC. >>>>>>> >>>>>>> Cheers, >>>>>>> Max >>>>>>> >>>>>>> On Fri, Nov 6, 2015 at 6:25 PM, Matthias J. Sax <[email protected]> >> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> that sounds great! I am very happy that people are interested in it >> and >>>>>>>> start to use it! Can you give some more details about this? I am >> just >>>>>>>> aware of a few question at SO. But there was no question about it >> on the >>>>>>>> mailing list lately... Did you get some more internal >> questions/feedback? >>>>>>>> >>>>>>>> And of course, other people should get involved as well! There is so >>>>>>>> much too do -- even if I work 40h a week on it, I cannot get >> everything >>>>>>>> done by myself. The last days were very busy for me. I hope I can >> work >>>>>>>> on a couple of bugs after the Munich Meetup. I started to look into >> them >>>>>>>> already... >>>>>>>> >>>>>>>> Should we start a roadmap in the Wiki? This might be helpful if more >>>>>>>> people get involved. >>>>>>>> >>>>>>>> And thanks for keeping me in the loop :) >>>>>>>> >>>>>>>> -Matthias >>>>>>>> >>>>>>>> >>>>>>>> On 11/06/2015 03:49 PM, Stephan Ewen wrote: >>>>>>>>> Hi Matthias! >>>>>>>>> >>>>>>>>> We are seeing a lot of people getting very excited about the Storm >>>>>>>>> Compatibility layer. I expect that quite a few people will >> seriously >>>>>>>>> start to work with it. >>>>>>>>> >>>>>>>>> I would suggest that we also start getting involved in that. Since >> you >>>>>>>>> have of course your priority on your Ph.D., it would be a little >> much >>>>>>>>> asked from you to dedicate a lot of time to support more features, >> be >>>>>>>>> super responsive with users all the time, etc. >>>>>>>>> >>>>>>>>> To that end, some people from us will start testing the API, adding >>>>>>>>> fixes, etc (which also helps us to understand this better when >> users ask >>>>>>>>> questions). >>>>>>>>> We would definitely like for you to stay involved (we don't want to >>>>>>>>> hijack this), and help with ideas, especially when it comes to >> things >>>>>>>>> like fault tolerance design, etc. >>>>>>>>> >>>>>>>>> What do you think? >>>>>>>>> >>>>>>>>> Greetings, >>>>>>>>> Stephan >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >> >> >
signature.asc
Description: OpenPGP digital signature
