Thanks, I added it. I'll keep a running list of ported/unported examples in my mails. I'll rename the java example package to examples once the Scala API merge is done.
I think the termination criterion is fine as it is. Just because Scala enables functional programming doesn't mean it's always the best choice. :D So far we have ported: WordCount, KMeans, ConnectedComponents, WebLogAnalysis, TransitiveClosureNaive These are the examples people called dibs on: - TriangleEnumration and PageRank (Fabian) - BatchGradientDescent (Márton) - ComputeEdgeDegrees (Hermann) Those are unclaimed (if I'm not mistaken): - The relational Stuff - LinearRegression Cheers, Aljoscha On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <ktzou...@apache.org> wrote: > Transitive closure here, I also added a termination criterion in the Java > version: https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example > > Perhaps you can make the termination criterion in Scala more functional? > > I noticed that the examples package name is example.java but examples.scala > > Kostas > > On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <ktzou...@apache.org> wrote: >> >> I'll take TransitiveClosure and PiEstimation (was not on your list). >> >> If nobody volunteers for the relational stuff I can take those as well. >> >> How about removing the "RelationalQuery" from both Scala and Java? It >> seems to be a proper subset of TPC-H Q3. Does it add some teaching value on >> top of TPC-H Q3? >> >> Kostas >> >> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <aljos...@apache.org> >> wrote: >>> >>> Thanks, I added it, along with an ITCase. >>> >>> So far we have ported: WordCount, KMeans, ConnectedComponents, >>> WebLogAnalysis >>> >>> These are the examples people called dibs on: >>> - TriangleEnumration and PageRank (Fabian) >>> - BatchGradientDescent (Márton) >>> - ComputeEdgeDegrees (Hermann) >>> >>> Those are unclaimed (if I'm not mistaken): >>> - TransitiveClosure >>> - The relational Stuff >>> - LinearRegression >>> >>> Cheers, >>> Aljoscha >>> >>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <ktzou...@apache.org> >>> wrote: >>> > WebLog here: >>> > >>> > https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala >>> > >>> > Do you need any more done? >>> > >>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <aljos...@apache.org> >>> > wrote: >>> > >>> >> I added the ConnectedComponents Example from Vasia. >>> >> >>> >> Keep 'em coming, people. :D >>> >> >>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <fhue...@apache.org> >>> >> wrote: >>> >> > Alright, will do. >>> >> > Thanks! >>> >> > >>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>: >>> >> > >>> >> >> Ok people, executive decision. :D >>> >> >> >>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing the >>> >> >> data >>> >> >> in multi-dimensional object arrays and then converting it to the >>> >> >> required Java or Scala objects. >>> >> >> >>> >> >> Also, I changed isEqualTo to equalTo to make it consistent with the >>> >> >> Java >>> >> >> API. >>> >> >> >>> >> >> Regarding Join (and coGroup). There is no need for a keyword, you >>> >> >> can >>> >> >> just write: >>> >> >> >>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le, >>> >> >> re) >>> >> } >>> >> >> >>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <fhue...@apache.org> >>> >> wrote: >>> >> >> > Aside from the DataSet issue, I also found an inconsistency with >>> >> >> > the >>> >> Java >>> >> >> > API. In Java join is done as: >>> >> >> > >>> >> >> > ds1.join(ds2).where(...).equalTo(...) >>> >> >> > >>> >> >> > where in the current Scala this is: >>> >> >> > >>> >> >> > ds1.join(d2).where(...).isEqualTo(...) >>> >> >> > >>> >> >> > isEqualTo() should be renamed to equalTo(), IMO. >>> >> >> > Also, join (+cross and coGroup?) lacks the with() method because >>> >> "with" >>> >> >> is >>> >> >> > a keyword in Scala. Should be offer something similar for Scala >>> >> >> > or go >>> >> >> with >>> >> >> > map() on Tuple2(left, right)? >>> >> >> > >>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <se...@apache.org>: >>> >> >> > >>> >> >> >> Instead of Strings, Object[][] would work as well. That is a >>> >> >> >> generic >>> >> >> >> representation of a Tuple. >>> >> >> >> >>> >> >> >> Alternatively, they could be stored as Java or Scala Tuples, >>> >> >> >> with a >>> >> >> generic >>> >> >> >> utility method to convert between the two. >>> >> >> >> >>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske >>> >> >> >> <fhue...@apache.org> >>> >> >> wrote: >>> >> >> >> >>> >> >> >> > Yeah, I ran into the same problem... >>> >> >> >> > >>> >> >> >> > +1 for using Strings and parsing them, but using the >>> >> >> >> > CSVFormat >>> >> won't >>> >> >> >> work >>> >> >> >> > because this is based on a FileInputFormat. >>> >> >> >> > So we would need to parse the Strings manually... >>> >> >> >> > >>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek >>> >> >> >> > <aljos...@apache.org>: >>> >> >> >> > >>> >> >> >> > > Hi, >>> >> >> >> > > on second thought. Maybe we should just change all the >>> >> >> >> > > example >>> >> input >>> >> >> >> > > data to strings and use CSV input formats in all the >>> >> >> >> > > examples. >>> >> What >>> >> >> do >>> >> >> >> > > you think? >>> >> >> >> > > >>> >> >> >> > > Cheers, >>> >> >> >> > > Aljoscha >>> >> >> >> > > >>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek < >>> >> >> aljos...@apache.org> >>> >> >> >> > > wrote: >>> >> >> >> > > > Hi, >>> >> >> >> > > > yes it's unfortunate that the data types are incompatible. >>> >> >> >> > > > I'm >>> >> >> afraid >>> >> >> >> > > > you have to to what you proposed: move the data to a >>> >> >> >> > > > static >>> >> field >>> >> >> and >>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala. >>> >> >> >> > > > It's >>> >> >> not >>> >> >> >> > > > nice, but copying would duplicate the data and make it >>> >> >> >> > > > easier >>> >> for >>> >> >> it >>> >> >> >> > > > to go out of sync in the Java and Scala versions. >>> >> >> >> > > > >>> >> >> >> > > > What do the others think? This will probably occur in all >>> >> >> >> > > > the >>> >> >> >> examples. >>> >> >> >> > > > >>> >> >> >> > > > Cheers, >>> >> >> >> > > > Aljoscha >>> >> >> >> > > > >>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri >>> >> >> >> > > > <vasilikikala...@gmail.com> wrote: >>> >> >> >> > > >> Hey, >>> >> >> >> > > >> >>> >> >> >> > > >> I have ported the Connected Components example, but I am >>> >> >> >> > > >> not >>> >> sure >>> >> >> >> how >>> >> >> >> > to >>> >> >> >> > > >> reuse the example input data from java-examples. >>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices and >>> >> >> >> > > >> edges >>> >> data >>> >> >> >> are >>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet() >>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take >>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as >>> >> parameter. >>> >> >> >> > > >> >>> >> >> >> > > >> One way is to provide public static fields (like in the >>> >> >> >> WordCountData >>> >> >> >> > > >> class), but this introduces a conversion >>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala >>> >> >> >> > > >> tuple and >>> >> >> from >>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an >>> >> unnecessary >>> >> >> >> > > complexity >>> >> >> >> > > >> for an example (?). >>> >> >> >> > > >> Another way is, of course, to copy the example data in >>> >> >> >> > > >> the >>> >> Scala >>> >> >> >> > > example. >>> >> >> >> > > >> >>> >> >> >> > > >> Am I missing something here? >>> >> >> >> > > >> >>> >> >> >> > > >> Thanks! >>> >> >> >> > > >> >>> >> >> >> > > >> Cheers, >>> >> >> >> > > >> V. >>> >> >> >> > > >> >>> >> >> >> > > >> >>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek < >>> >> aljos...@apache.org >>> >> >> > >>> >> >> >> > > wrote: >>> >> >> >> > > >> >>> >> >> >> > > >>> Alright, I updated my repo: >>> >> >> >> > > >>> >>> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >>> >> >> >> > > >>> >>> >> >> >> > > >>> This now has a working WordCount example. It's pretty >>> >> >> >> > > >>> much a >>> >> >> copy >>> >> >> >> of >>> >> >> >> > > >>> the Java example with some fixups for the syntax and >>> >> >> >> > > >>> lambda >>> >> >> >> > functions. >>> >> >> >> > > >>> You'll also notice that I added the java-examples as a >>> >> >> dependency >>> >> >> >> for >>> >> >> >> > > >>> the scala-examples. I did this to reuse the example >>> >> >> >> > > >>> input >>> >> data. >>> >> >> >> > > >>> >>> >> >> >> > > >>> When you ported a program you can do a pull request >>> >> >> >> > > >>> against >>> >> my >>> >> >> repo >>> >> >> >> > > >>> and I will collect the examples. >>> >> >> >> > > >>> >>> >> >> >> > > >>> Happy coding. :D >>> >> >> >> > > >>> >>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor < >>> >> >> >> reckone...@gmail.com >>> >> >> >> > > >>> >> >> >> > > >>> wrote: >>> >> >> >> > > >>> > +1 >>> >> >> >> > > >>> > >>> >> >> >> > > >>> > ComputeEdgeDegrees for me! >>> >> >> >> > > >>> > >>> >> >> >> > > >>> > >>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < >>> >> >> >> > > >>> balassi.mar...@gmail.com> >>> >> >> >> > > >>> > wrote: >>> >> >> >> > > >>> > >>> >> >> >> > > >>> >> +1 >>> >> >> >> > > >>> >> >>> >> >> >> > > >>> >> BatchGradientDescent for me :) >>> >> >> >> > > >>> >> >>> >> >> >> > > >>> >> >>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < >>> >> >> >> > > ktzou...@apache.org> >>> >> >> >> > > >>> >> wrote: >>> >> >> >> > > >>> >> >>> >> >> >> > > >>> >> > +1 >>> >> >> >> > > >>> >> > >>> >> >> >> > > >>> >> > I go for WebLogAnalysis. >>> >> >> >> > > >>> >> > >>> >> >> >> > > >>> >> > My experience with Scala consists of going through >>> >> >> >> > > >>> >> > a >>> >> >> tutorial >>> >> >> >> so >>> >> >> >> > > this >>> >> >> >> > > >>> >> will >>> >> >> >> > > >>> >> > be a good stress test both for me and the new API >>> >> >> >> > > >>> >> > :-) >>> >> >> >> > > >>> >> > >>> >> >> >> > > >>> >> > >>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < >>> >> >> >> > > >>> >> > vasilikikala...@gmail.com> >>> >> >> >> > > >>> >> > wrote: >>> >> >> >> > > >>> >> > >>> >> >> >> > > >>> >> > > +1 for having other people implement the >>> >> >> >> > > >>> >> > > examples! >>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :) >>> >> >> >> > > >>> >> > > >>> >> >> >> > > >>> >> > > -V. >>> >> >> >> > > >>> >> > > >>> >> >> >> > > >>> >> > > >>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske < >>> >> >> >> fhue...@apache.org> >>> >> >> >> > > >>> wrote: >>> >> >> >> > > >>> >> > > >>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank. >>> >> >> >> > > >>> >> > > > >>> >> >> >> > > >>> >> > > > Let's also do the examples similar to the Java >>> >> >> examples: >>> >> >> >> > > >>> >> > > > - running out-of-the-box without parameters >>> >> >> >> > > >>> >> > > > - parameters for external data >>> >> >> >> > > >>> >> > > > - follow a similar code structure >>> >> >> >> > > >>> >> > > > >>> >> >> >> > > >>> >> > > > >>> >> >> >> > > >>> >> > > > >>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < >>> >> >> >> > > aljos...@apache.org >>> >> >> >> > > >>> >: >>> >> >> >> > > >>> >> > > > >>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their >>> >> >> >> > > >>> >> > > > > favourite >>> >> >> >> examples >>> >> >> >> > > here. >>> >> >> >> > > >>> >> > > > > >>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske >>> >> >> >> > > >>> >> > > > > < >>> >> >> >> > > >>> fhue...@apache.org> >>> >> >> >> > > >>> >> > > > wrote: >>> >> >> >> > > >>> >> > > > > > Hi, >>> >> >> >> > > >>> >> > > > > > >>> >> >> >> > > >>> >> > > > > > I think having examples implemented by >>> >> >> >> > > >>> >> > > > > > different >>> >> >> >> people >>> >> >> >> > > >>> proved to >>> >> >> >> > > >>> >> > be >>> >> >> >> > > >>> >> > > > > > valuable in the past. >>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples. >>> >> >> >> > > >>> >> > > > > > >>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple >>> >> >> >> > > >>> >> > > > > > first >>> >> >> one >>> >> >> >> > such >>> >> >> >> > > as >>> >> >> >> > > >>> >> > > WordCount. >>> >> >> >> > > >>> >> > > > > > >>> >> >> >> > > >>> >> > > > > > Fabian >>> >> >> >> > > >>> >> > > > > > >>> >> >> >> > > >>> >> > > > > > >>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek >>> >> >> >> > > >>> >> > > > > > < >>> >> >> >> > > >>> aljos...@apache.org >>> >> >> >> > > >>> >> >: >>> >> >> >> > > >>> >> > > > > > >>> >> >> >> > > >>> >> > > > > >> Hi, >>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API >>> >> >> >> > > >>> >> > > > > >> here: >>> >> >> >> > > >>> >> > > > > >> >>> >> >> >> > > >>> >> >>> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >>> >> >> >> > > >>> >> > > > > >> >>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write >>> >> >> >> > > >>> >> > > > > >> the >>> >> tests >>> >> >> and >>> >> >> >> > > port >>> >> >> >> > > >>> the >>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to >>> >> >> >> > > >>> >> > > > > >> let >>> >> other >>> >> >> >> > people >>> >> >> >> > > >>> port >>> >> >> >> > > >>> >> the >>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses it and >>> >> maybe >>> >> >> >> > notices >>> >> >> >> > > some >>> >> >> >> > > >>> >> > quirks >>> >> >> >> > > >>> >> > > > > >> in the API? >>> >> >> >> > > >>> >> > > > > >> >>> >> >> >> > > >>> >> > > > > >> Cheers, >>> >> >> >> > > >>> >> > > > > >> Aljoscha >>> >> >> >> > > >>> >> > > > > >> >>> >> >> >> > > >>> >> > > > > >>> >> >> >> > > >>> >> > > > >>> >> >> >> > > >>> >> > > >>> >> >> >> > > >>> >> > >>> >> >> >> > > >>> >> >>> >> >> >> > > >>> >>> >> >> >> > > >>> >> >> >> > >>> >> >> >> >>> >> >> >>> >> >> >> >