I added the Triangle Enumeration Examples, thanks Fabian. So far we have ported: WordCount, KMeans, ConnectedComponents, WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt
These are the examples people called dibs on: - PageRank (Fabian) - BatchGradientDescent (Márton) - ComputeEdgeDegrees (Hermann) Those are unclaimed (if I'm not mistaken): - The relational Stuff - LinearRegression On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <[email protected]> wrote: > Thanks, I added it. I'll keep a running list of ported/unported > examples in my mails. I'll rename the java example package to examples > once the Scala API merge is done. > > I think the termination criterion is fine as it is. Just because Scala > enables functional programming doesn't mean it's always the best > choice. :D > > So far we have ported: WordCount, KMeans, ConnectedComponents, > WebLogAnalysis, TransitiveClosureNaive > > These are the examples people called dibs on: > - TriangleEnumration and PageRank (Fabian) > - BatchGradientDescent (Márton) > - ComputeEdgeDegrees (Hermann) > > Those are unclaimed (if I'm not mistaken): > - The relational Stuff > - LinearRegression > > Cheers, > Aljoscha > > On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <[email protected]> wrote: >> Transitive closure here, I also added a termination criterion in the Java >> version: https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example >> >> Perhaps you can make the termination criterion in Scala more functional? >> >> I noticed that the examples package name is example.java but examples.scala >> >> Kostas >> >> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <[email protected]> wrote: >>> >>> I'll take TransitiveClosure and PiEstimation (was not on your list). >>> >>> If nobody volunteers for the relational stuff I can take those as well. >>> >>> How about removing the "RelationalQuery" from both Scala and Java? It >>> seems to be a proper subset of TPC-H Q3. Does it add some teaching value on >>> top of TPC-H Q3? >>> >>> Kostas >>> >>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <[email protected]> >>> wrote: >>>> >>>> Thanks, I added it, along with an ITCase. >>>> >>>> So far we have ported: WordCount, KMeans, ConnectedComponents, >>>> WebLogAnalysis >>>> >>>> These are the examples people called dibs on: >>>> - TriangleEnumration and PageRank (Fabian) >>>> - BatchGradientDescent (Márton) >>>> - ComputeEdgeDegrees (Hermann) >>>> >>>> Those are unclaimed (if I'm not mistaken): >>>> - TransitiveClosure >>>> - The relational Stuff >>>> - LinearRegression >>>> >>>> Cheers, >>>> Aljoscha >>>> >>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <[email protected]> >>>> wrote: >>>> > WebLog here: >>>> > >>>> > https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala >>>> > >>>> > Do you need any more done? >>>> > >>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <[email protected]> >>>> > wrote: >>>> > >>>> >> I added the ConnectedComponents Example from Vasia. >>>> >> >>>> >> Keep 'em coming, people. :D >>>> >> >>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <[email protected]> >>>> >> wrote: >>>> >> > Alright, will do. >>>> >> > Thanks! >>>> >> > >>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <[email protected]>: >>>> >> > >>>> >> >> Ok people, executive decision. :D >>>> >> >> >>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing the >>>> >> >> data >>>> >> >> in multi-dimensional object arrays and then converting it to the >>>> >> >> required Java or Scala objects. >>>> >> >> >>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent with the >>>> >> >> Java >>>> >> >> API. >>>> >> >> >>>> >> >> Regarding Join (and coGroup). There is no need for a keyword, you >>>> >> >> can >>>> >> >> just write: >>>> >> >> >>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le, >>>> >> >> re) >>>> >> } >>>> >> >> >>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <[email protected]> >>>> >> wrote: >>>> >> >> > Aside from the DataSet issue, I also found an inconsistency with >>>> >> >> > the >>>> >> Java >>>> >> >> > API. In Java join is done as: >>>> >> >> > >>>> >> >> > ds1.join(ds2).where(...).equalTo(...) >>>> >> >> > >>>> >> >> > where in the current Scala this is: >>>> >> >> > >>>> >> >> > ds1.join(d2).where(...).isEqualTo(...) >>>> >> >> > >>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO. >>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method because >>>> >> "with" >>>> >> >> is >>>> >> >> > a keyword in Scala. Should be offer something similar for Scala >>>> >> >> > or go >>>> >> >> with >>>> >> >> > map() on Tuple2(left, right)? >>>> >> >> > >>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[email protected]>: >>>> >> >> > >>>> >> >> >> Instead of Strings, Object[][] would work as well. That is a >>>> >> >> >> generic >>>> >> >> >> representation of a Tuple. >>>> >> >> >> >>>> >> >> >> Alternatively, they could be stored as Java or Scala Tuples, >>>> >> >> >> with a >>>> >> >> generic >>>> >> >> >> utility method to convert between the two. >>>> >> >> >> >>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske >>>> >> >> >> <[email protected]> >>>> >> >> wrote: >>>> >> >> >> >>>> >> >> >> > Yeah, I ran into the same problem... >>>> >> >> >> > >>>> >> >> >> > +1 for using Strings and parsing them, but using the >>>> >> >> >> > CSVFormat >>>> >> won't >>>> >> >> >> work >>>> >> >> >> > because this is based on a FileInputFormat. >>>> >> >> >> > So we would need to parse the Strings manually... >>>> >> >> >> > >>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek >>>> >> >> >> > <[email protected]>: >>>> >> >> >> > >>>> >> >> >> > > Hi, >>>> >> >> >> > > on second thought. Maybe we should just change all the >>>> >> >> >> > > example >>>> >> input >>>> >> >> >> > > data to strings and use CSV input formats in all the >>>> >> >> >> > > examples. >>>> >> What >>>> >> >> do >>>> >> >> >> > > you think? >>>> >> >> >> > > >>>> >> >> >> > > Cheers, >>>> >> >> >> > > Aljoscha >>>> >> >> >> > > >>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek < >>>> >> >> [email protected]> >>>> >> >> >> > > wrote: >>>> >> >> >> > > > Hi, >>>> >> >> >> > > > yes it's unfortunate that the data types are incompatible. >>>> >> >> >> > > > I'm >>>> >> >> afraid >>>> >> >> >> > > > you have to to what you proposed: move the data to a >>>> >> >> >> > > > static >>>> >> field >>>> >> >> and >>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala. >>>> >> >> >> > > > It's >>>> >> >> not >>>> >> >> >> > > > nice, but copying would duplicate the data and make it >>>> >> >> >> > > > easier >>>> >> for >>>> >> >> it >>>> >> >> >> > > > to go out of sync in the Java and Scala versions. >>>> >> >> >> > > > >>>> >> >> >> > > > What do the others think? This will probably occur in all >>>> >> >> >> > > > the >>>> >> >> >> examples. >>>> >> >> >> > > > >>>> >> >> >> > > > Cheers, >>>> >> >> >> > > > Aljoscha >>>> >> >> >> > > > >>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri >>>> >> >> >> > > > <[email protected]> wrote: >>>> >> >> >> > > >> Hey, >>>> >> >> >> > > >> >>>> >> >> >> > > >> I have ported the Connected Components example, but I am >>>> >> >> >> > > >> not >>>> >> sure >>>> >> >> >> how >>>> >> >> >> > to >>>> >> >> >> > > >> reuse the example input data from java-examples. >>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices and >>>> >> >> >> > > >> edges >>>> >> data >>>> >> >> >> are >>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet() >>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take >>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as >>>> >> parameter. >>>> >> >> >> > > >> >>>> >> >> >> > > >> One way is to provide public static fields (like in the >>>> >> >> >> WordCountData >>>> >> >> >> > > >> class), but this introduces a conversion >>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala >>>> >> >> >> > > >> tuple and >>>> >> >> from >>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an >>>> >> unnecessary >>>> >> >> >> > > complexity >>>> >> >> >> > > >> for an example (?). >>>> >> >> >> > > >> Another way is, of course, to copy the example data in >>>> >> >> >> > > >> the >>>> >> Scala >>>> >> >> >> > > example. >>>> >> >> >> > > >> >>>> >> >> >> > > >> Am I missing something here? >>>> >> >> >> > > >> >>>> >> >> >> > > >> Thanks! >>>> >> >> >> > > >> >>>> >> >> >> > > >> Cheers, >>>> >> >> >> > > >> V. >>>> >> >> >> > > >> >>>> >> >> >> > > >> >>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek < >>>> >> [email protected] >>>> >> >> > >>>> >> >> >> > > wrote: >>>> >> >> >> > > >> >>>> >> >> >> > > >>> Alright, I updated my repo: >>>> >> >> >> > > >>> >>>> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >>>> >> >> >> > > >>> >>>> >> >> >> > > >>> This now has a working WordCount example. It's pretty >>>> >> >> >> > > >>> much a >>>> >> >> copy >>>> >> >> >> of >>>> >> >> >> > > >>> the Java example with some fixups for the syntax and >>>> >> >> >> > > >>> lambda >>>> >> >> >> > functions. >>>> >> >> >> > > >>> You'll also notice that I added the java-examples as a >>>> >> >> dependency >>>> >> >> >> for >>>> >> >> >> > > >>> the scala-examples. I did this to reuse the example >>>> >> >> >> > > >>> input >>>> >> data. >>>> >> >> >> > > >>> >>>> >> >> >> > > >>> When you ported a program you can do a pull request >>>> >> >> >> > > >>> against >>>> >> my >>>> >> >> repo >>>> >> >> >> > > >>> and I will collect the examples. >>>> >> >> >> > > >>> >>>> >> >> >> > > >>> Happy coding. :D >>>> >> >> >> > > >>> >>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor < >>>> >> >> >> [email protected] >>>> >> >> >> > > >>>> >> >> >> > > >>> wrote: >>>> >> >> >> > > >>> > +1 >>>> >> >> >> > > >>> > >>>> >> >> >> > > >>> > ComputeEdgeDegrees for me! >>>> >> >> >> > > >>> > >>>> >> >> >> > > >>> > >>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < >>>> >> >> >> > > >>> [email protected]> >>>> >> >> >> > > >>> > wrote: >>>> >> >> >> > > >>> > >>>> >> >> >> > > >>> >> +1 >>>> >> >> >> > > >>> >> >>>> >> >> >> > > >>> >> BatchGradientDescent for me :) >>>> >> >> >> > > >>> >> >>>> >> >> >> > > >>> >> >>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < >>>> >> >> >> > > [email protected]> >>>> >> >> >> > > >>> >> wrote: >>>> >> >> >> > > >>> >> >>>> >> >> >> > > >>> >> > +1 >>>> >> >> >> > > >>> >> > >>>> >> >> >> > > >>> >> > I go for WebLogAnalysis. >>>> >> >> >> > > >>> >> > >>>> >> >> >> > > >>> >> > My experience with Scala consists of going through >>>> >> >> >> > > >>> >> > a >>>> >> >> tutorial >>>> >> >> >> so >>>> >> >> >> > > this >>>> >> >> >> > > >>> >> will >>>> >> >> >> > > >>> >> > be a good stress test both for me and the new API >>>> >> >> >> > > >>> >> > :-) >>>> >> >> >> > > >>> >> > >>>> >> >> >> > > >>> >> > >>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < >>>> >> >> >> > > >>> >> > [email protected]> >>>> >> >> >> > > >>> >> > wrote: >>>> >> >> >> > > >>> >> > >>>> >> >> >> > > >>> >> > > +1 for having other people implement the >>>> >> >> >> > > >>> >> > > examples! >>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :) >>>> >> >> >> > > >>> >> > > >>>> >> >> >> > > >>> >> > > -V. >>>> >> >> >> > > >>> >> > > >>>> >> >> >> > > >>> >> > > >>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske < >>>> >> >> >> [email protected]> >>>> >> >> >> > > >>> wrote: >>>> >> >> >> > > >>> >> > > >>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank. >>>> >> >> >> > > >>> >> > > > >>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to the Java >>>> >> >> examples: >>>> >> >> >> > > >>> >> > > > - running out-of-the-box without parameters >>>> >> >> >> > > >>> >> > > > - parameters for external data >>>> >> >> >> > > >>> >> > > > - follow a similar code structure >>>> >> >> >> > > >>> >> > > > >>>> >> >> >> > > >>> >> > > > >>>> >> >> >> > > >>> >> > > > >>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < >>>> >> >> >> > > [email protected] >>>> >> >> >> > > >>> >: >>>> >> >> >> > > >>> >> > > > >>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their >>>> >> >> >> > > >>> >> > > > > favourite >>>> >> >> >> examples >>>> >> >> >> > > here. >>>> >> >> >> > > >>> >> > > > > >>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske >>>> >> >> >> > > >>> >> > > > > < >>>> >> >> >> > > >>> [email protected]> >>>> >> >> >> > > >>> >> > > > wrote: >>>> >> >> >> > > >>> >> > > > > > Hi, >>>> >> >> >> > > >>> >> > > > > > >>>> >> >> >> > > >>> >> > > > > > I think having examples implemented by >>>> >> >> >> > > >>> >> > > > > > different >>>> >> >> >> people >>>> >> >> >> > > >>> proved to >>>> >> >> >> > > >>> >> > be >>>> >> >> >> > > >>> >> > > > > > valuable in the past. >>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples. >>>> >> >> >> > > >>> >> > > > > > >>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple >>>> >> >> >> > > >>> >> > > > > > first >>>> >> >> one >>>> >> >> >> > such >>>> >> >> >> > > as >>>> >> >> >> > > >>> >> > > WordCount. >>>> >> >> >> > > >>> >> > > > > > >>>> >> >> >> > > >>> >> > > > > > Fabian >>>> >> >> >> > > >>> >> > > > > > >>>> >> >> >> > > >>> >> > > > > > >>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek >>>> >> >> >> > > >>> >> > > > > > < >>>> >> >> >> > > >>> [email protected] >>>> >> >> >> > > >>> >> >: >>>> >> >> >> > > >>> >> > > > > > >>>> >> >> >> > > >>> >> > > > > >> Hi, >>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API >>>> >> >> >> > > >>> >> > > > > >> here: >>>> >> >> >> > > >>> >> > > > > >> >>>> >> >> >> > > >>> >> >>>> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >>>> >> >> >> > > >>> >> > > > > >> >>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write >>>> >> >> >> > > >>> >> > > > > >> the >>>> >> tests >>>> >> >> and >>>> >> >> >> > > port >>>> >> >> >> > > >>> the >>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to >>>> >> >> >> > > >>> >> > > > > >> let >>>> >> other >>>> >> >> >> > people >>>> >> >> >> > > >>> port >>>> >> >> >> > > >>> >> the >>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses it and >>>> >> maybe >>>> >> >> >> > notices >>>> >> >> >> > > some >>>> >> >> >> > > >>> >> > quirks >>>> >> >> >> > > >>> >> > > > > >> in the API? >>>> >> >> >> > > >>> >> > > > > >> >>>> >> >> >> > > >>> >> > > > > >> Cheers, >>>> >> >> >> > > >>> >> > > > > >> Aljoscha >>>> >> >> >> > > >>> >> > > > > >> >>>> >> >> >> > > >>> >> > > > > >>>> >> >> >> > > >>> >> > > > >>>> >> >> >> > > >>> >> > > >>>> >> >> >> > > >>> >> > >>>> >> >> >> > > >>> >> >>>> >> >> >> > > >>> >>>> >> >> >> > > >>>> >> >> >> > >>>> >> >> >> >>>> >> >> >>>> >> >>> >>> >>
