I would be against adding anything Storm-specific in the core (streaming is core as well) Flink APIs. If we add stuff there we have to stick to it and I don’t see a lot of use for reusing single Bolts/Spouts.
I’m very excited about the work on Storm compatibility in general, though. :D > On 14 Nov 2015, at 17:19, Matthias J. Sax <[email protected]> wrote: > > About DataStream extension and setting storm dependency to provided. If > this works, a big +1 from my side. > > -Matthias > > > On 11/14/2015 05:13 PM, Matthias J. Sax wrote: >> I just had a look at your proposal. It makes a lot of sense. I still >> believe that it is a matter of taste if one prefers your or my point of >> view. Both approaches allows to easily reuse and execute Storm >> Topologies on Flink (what is the most important feature we need to have). >> >> I hope to get some more feedback from the community, if the >> Strom-compatibility should be more "stormy" or more "flinky". Bot >> approaches make sense to me. >> >> >> I view minor comments: >> >> * FileSpout vs FiniteFileSpout >> -> FileSpout was implemented in a Storm way -- to set the "finished" >> flag here does not make sense from a Storm point of view (there is no >> such thing as a finite spout) >> Thus, this example shows how a regular Storm spout can be improved >> using FiniteSpout interface -- I would keep it as is (even if seems to >> be unnecessary complicated -- imagine that you don't have the code of >> FileSpout) >> >> * You changed examples to use finite-spouts -- from a testing point of >> view this makes sense. However, the examples should show how to run an >> *unmodified* Storm topology in Flink. >> >> * we should keep the local copy "unprocessedBolts" when creating a Flink >> program to allow to re-submit the same topology object twice (or alter >> it after submission). If you don't make the copy, submitting/translating >> the topology into a Flink job alters the object (which should not >> happen). And as it is not performance critical, the copying overhead >> does not matter. >> >> * Why did you change the dop from 4 to 1 WordCountTopology ? We should >> test in parallel fashion... >> >> * Too many reformatting changes ;) You though many classes without any >> actual code changes. >> >> >> >> >> >> >> -------- Forwarded Message -------- >> Subject: Re: Storm Compatibility >> Date: Fri, 13 Nov 2015 12:15:19 +0100 >> From: Maximilian Michels <[email protected]> >> To: Matthias J. Sax <[email protected]> >> CC: Stephan Ewen <[email protected]>, Robert Metzger <[email protected]> >> >> Hi Matthias, >> >> Thank you for your remarks. >> >> I believe the goal of the compatibility layer should not be to mimic >> Storm's API but to easily execute Storm typologies using Flink. I see >> that it is easy for users to use class names for execution they know >> from Storm but I think this makes the API verbose. I've refactored it >> a bit to make it more aligned with Flink's execution model. After all, >> the most important thing is that it makes it easy for people to reuse >> Storm typologies while getting all the advantages of Flink. >> >> Let me explain what I have done so far: >> https://github.com/apache/flink/compare/master...mxm:storm-dev >> >> API >> - remove FlinkClient, FlinkSubmitter, FlinkLocalCluster, >> FlinkTopology: They are not necessary in my opinion and are >> replicating functionality already included in Flink or Storm. >> >> - Build the topology with the Storm TopologyBuilder (instead of >> FlinkTopology) which is then passed to the FlinkTopologyBuilder which >> generates the StreamExecutionEnvironment containing the StreamGraph. >> You can then simply call execute() like you would usually do in Flink. >> This lets you reuse your Storm typologies with the ease of Flink >> context-based execution mechanism. Note that it works in local and >> remote execution mode without changing any code. >> >> Tests >> - replaced StormTestBase.java with StreamingTestBase >> - use a Finite source for the tests and changed it a bit >> >> Examples >> - Convert examples to new API >> - Remove duplicate examples (local and remote) >> >> I hope these changes are not too invasive for you. I think it makes >> the compatibility layer much easier to use. Let me know what you think >> about it. Of course, we can iterate on it. >> >> About the integration of the compatibility layer into DataStream: >> Wouldn't it be possible to set storm to provided and let the user >> include the jar if he/she wants to use the Storm compatibility? That's >> also what we do for other libraries like Gelly. You have to package >> them into the JAR if you want to run them on the cluster. We should >> give a good error message if classes cannot be found. >> >> +1 for moving the discussion to the dev list. >> >> Cheers, >> Max >> >> On Fri, Nov 13, 2015 at 7:41 AM, Matthias J. Sax <[email protected]> wrote: >>> One more thing that just came to my mind about (1): I have to correct my >>> last reply on it: >>> >>> We **cannot reuse** TopologyBuilder because the returned StormTopology >>> from .createTopology() does **not** contain the references to the >>> Spout/Bolt object. Internally, those are already serialized into an >>> internal Thrift representation (as preparation to get sent to Nimbus). >>> However, in order to create a Flink job, we need the references of course... >>> >>> -Matthias >>> >>> >>> On 11/11/2015 04:33 PM, Maximilian Michels wrote: >>>> Hi Matthias, >>>> >>>> Sorry for getting back to you late. I'm very new to Storm but have >>>> familiarized myself a bit the last days. While looking through the >>>> Storm examples and the compatibility layer I discovered the following >>>> issues: >>>> >>>> 1) The compatibility layer mirrors the Storm API instead of reusing >>>> it. Why do we need a FlinkTopologyBuilder, FlinkCluster, >>>> FlinkSubmitter, FlinkClient? Couldn't all these user-facing classes by >>>> replaced by e.g. StormExecutionEnvironment which receives the Storm >>>> topology and upon getStreamGraph() just traverses it? >>>> >>>> 2) DRPC is not yet supported. I don't know how crucial this is but it >>>> seems to be widespread Storm feature. If we wrapped the entire Storm >>>> topology, we could give appropriate errors when we see such >>>> unsupported features. >>>> >>>> 3) We could simplify embedding Spouts and Bolts directly as operator >>>> functions. Users shouldn't have to worry about extracting the types. >>>> Perhaps we could implement a dedicated method to add spouts/bolts on >>>> DataStream? >>>> >>>> 5) Performance: The BoltWrapper creates a StormTuple for every >>>> incoming record. I think this could be improved. Couldn't we use the >>>> StormTuple as data type instead of Flink's tuples? >>>> >>>> 6) Trident Examples. Have you run any? >>>> >>>> That's it for now. I'm sure you know about many more improvements or >>>> problems because you're the expert on this. In the meantime, I'll try >>>> to contact you via IRC. >>>> >>>> Cheers, >>>> Max >>>> >>>> On Fri, Nov 6, 2015 at 6:25 PM, Matthias J. Sax <[email protected]> wrote: >>>>> >>>>> Hi, >>>>> >>>>> that sounds great! I am very happy that people are interested in it and >>>>> start to use it! Can you give some more details about this? I am just >>>>> aware of a few question at SO. But there was no question about it on the >>>>> mailing list lately... Did you get some more internal questions/feedback? >>>>> >>>>> And of course, other people should get involved as well! There is so >>>>> much too do -- even if I work 40h a week on it, I cannot get everything >>>>> done by myself. The last days were very busy for me. I hope I can work >>>>> on a couple of bugs after the Munich Meetup. I started to look into them >>>>> already... >>>>> >>>>> Should we start a roadmap in the Wiki? This might be helpful if more >>>>> people get involved. >>>>> >>>>> And thanks for keeping me in the loop :) >>>>> >>>>> -Matthias >>>>> >>>>> >>>>> On 11/06/2015 03:49 PM, Stephan Ewen wrote: >>>>>> Hi Matthias! >>>>>> >>>>>> We are seeing a lot of people getting very excited about the Storm >>>>>> Compatibility layer. I expect that quite a few people will seriously >>>>>> start to work with it. >>>>>> >>>>>> I would suggest that we also start getting involved in that. Since you >>>>>> have of course your priority on your Ph.D., it would be a little much >>>>>> asked from you to dedicate a lot of time to support more features, be >>>>>> super responsive with users all the time, etc. >>>>>> >>>>>> To that end, some people from us will start testing the API, adding >>>>>> fixes, etc (which also helps us to understand this better when users ask >>>>>> questions). >>>>>> We would definitely like for you to stay involved (we don't want to >>>>>> hijack this), and help with ideas, especially when it comes to things >>>>>> like fault tolerance design, etc. >>>>>> >>>>>> What do you think? >>>>>> >>>>>> Greetings, >>>>>> Stephan >>>>>> >>>>> >>> >> >> >> >> >
