I'll take TransitiveClosure and PiEstimation (was not on your list). If nobody volunteers for the relational stuff I can take those as well.
How about removing the "RelationalQuery" from both Scala and Java? It seems to be a proper subset of TPC-H Q3. Does it add some teaching value on top of TPC-H Q3? Kostas On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Thanks, I added it, along with an ITCase. > > So far we have ported: WordCount, KMeans, ConnectedComponents, > WebLogAnalysis > > These are the examples people called dibs on: > - TriangleEnumration and PageRank (Fabian) > - BatchGradientDescent (Márton) > - ComputeEdgeDegrees (Hermann) > > Those are unclaimed (if I'm not mistaken): > - TransitiveClosure > - The relational Stuff > - LinearRegression > > Cheers, > Aljoscha > > On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <ktzou...@apache.org> > wrote: > > WebLog here: > > > https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala > > > > Do you need any more done? > > > > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <aljos...@apache.org> > > wrote: > > > >> I added the ConnectedComponents Example from Vasia. > >> > >> Keep 'em coming, people. :D > >> > >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <fhue...@apache.org> > wrote: > >> > Alright, will do. > >> > Thanks! > >> > > >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>: > >> > > >> >> Ok people, executive decision. :D > >> >> > >> >> Please look at KMeansData.java and KMeans.scala. I'm storing the data > >> >> in multi-dimensional object arrays and then converting it to the > >> >> required Java or Scala objects. > >> >> > >> >> Also, I changed isEqualTo to equalTo to make it consistent with the > Java > >> >> API. > >> >> > >> >> Regarding Join (and coGroup). There is no need for a keyword, you can > >> >> just write: > >> >> > >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le, > re) > >> } > >> >> > >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <fhue...@apache.org> > >> wrote: > >> >> > Aside from the DataSet issue, I also found an inconsistency with > the > >> Java > >> >> > API. In Java join is done as: > >> >> > > >> >> > ds1.join(ds2).where(...).equalTo(...) > >> >> > > >> >> > where in the current Scala this is: > >> >> > > >> >> > ds1.join(d2).where(...).isEqualTo(...) > >> >> > > >> >> > isEqualTo() should be renamed to equalTo(), IMO. > >> >> > Also, join (+cross and coGroup?) lacks the with() method because > >> "with" > >> >> is > >> >> > a keyword in Scala. Should be offer something similar for Scala or > go > >> >> with > >> >> > map() on Tuple2(left, right)? > >> >> > > >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <se...@apache.org>: > >> >> > > >> >> >> Instead of Strings, Object[][] would work as well. That is a > generic > >> >> >> representation of a Tuple. > >> >> >> > >> >> >> Alternatively, they could be stored as Java or Scala Tuples, with > a > >> >> generic > >> >> >> utility method to convert between the two. > >> >> >> > >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske < > fhue...@apache.org> > >> >> wrote: > >> >> >> > >> >> >> > Yeah, I ran into the same problem... > >> >> >> > > >> >> >> > +1 for using Strings and parsing them, but using the CSVFormat > >> won't > >> >> >> work > >> >> >> > because this is based on a FileInputFormat. > >> >> >> > So we would need to parse the Strings manually... > >> >> >> > > >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek < > aljos...@apache.org>: > >> >> >> > > >> >> >> > > Hi, > >> >> >> > > on second thought. Maybe we should just change all the example > >> input > >> >> >> > > data to strings and use CSV input formats in all the examples. > >> What > >> >> do > >> >> >> > > you think? > >> >> >> > > > >> >> >> > > Cheers, > >> >> >> > > Aljoscha > >> >> >> > > > >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek < > >> >> aljos...@apache.org> > >> >> >> > > wrote: > >> >> >> > > > Hi, > >> >> >> > > > yes it's unfortunate that the data types are incompatible. > I'm > >> >> afraid > >> >> >> > > > you have to to what you proposed: move the data to a static > >> field > >> >> and > >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala. > It's > >> >> not > >> >> >> > > > nice, but copying would duplicate the data and make it > easier > >> for > >> >> it > >> >> >> > > > to go out of sync in the Java and Scala versions. > >> >> >> > > > > >> >> >> > > > What do the others think? This will probably occur in all > the > >> >> >> examples. > >> >> >> > > > > >> >> >> > > > Cheers, > >> >> >> > > > Aljoscha > >> >> >> > > > > >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri > >> >> >> > > > <vasilikikala...@gmail.com> wrote: > >> >> >> > > >> Hey, > >> >> >> > > >> > >> >> >> > > >> I have ported the Connected Components example, but I am > not > >> sure > >> >> >> how > >> >> >> > to > >> >> >> > > >> reuse the example input data from java-examples. > >> >> >> > > >> In the ConnectedComponentsData class, the vertices and > edges > >> data > >> >> >> are > >> >> >> > > >> produced by the methods getDefaultVertexDataSet() > >> >> >> > > >> and getDefaultEdgeDataSet(), which take > >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as > >> parameter. > >> >> >> > > >> > >> >> >> > > >> One way is to provide public static fields (like in the > >> >> >> WordCountData > >> >> >> > > >> class), but this introduces a conversion > >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple > and > >> >> from > >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an > >> unnecessary > >> >> >> > > complexity > >> >> >> > > >> for an example (?). > >> >> >> > > >> Another way is, of course, to copy the example data in the > >> Scala > >> >> >> > > example. > >> >> >> > > >> > >> >> >> > > >> Am I missing something here? > >> >> >> > > >> > >> >> >> > > >> Thanks! > >> >> >> > > >> > >> >> >> > > >> Cheers, > >> >> >> > > >> V. > >> >> >> > > >> > >> >> >> > > >> > >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek < > >> aljos...@apache.org > >> >> > > >> >> >> > > wrote: > >> >> >> > > >> > >> >> >> > > >>> Alright, I updated my repo: > >> >> >> > > >>> > >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > >> >> >> > > >>> > >> >> >> > > >>> This now has a working WordCount example. It's pretty > much a > >> >> copy > >> >> >> of > >> >> >> > > >>> the Java example with some fixups for the syntax and > lambda > >> >> >> > functions. > >> >> >> > > >>> You'll also notice that I added the java-examples as a > >> >> dependency > >> >> >> for > >> >> >> > > >>> the scala-examples. I did this to reuse the example input > >> data. > >> >> >> > > >>> > >> >> >> > > >>> When you ported a program you can do a pull request > against > >> my > >> >> repo > >> >> >> > > >>> and I will collect the examples. > >> >> >> > > >>> > >> >> >> > > >>> Happy coding. :D > >> >> >> > > >>> > >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor < > >> >> >> reckone...@gmail.com > >> >> >> > > > >> >> >> > > >>> wrote: > >> >> >> > > >>> > +1 > >> >> >> > > >>> > > >> >> >> > > >>> > ComputeEdgeDegrees for me! > >> >> >> > > >>> > > >> >> >> > > >>> > > >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < > >> >> >> > > >>> balassi.mar...@gmail.com> > >> >> >> > > >>> > wrote: > >> >> >> > > >>> > > >> >> >> > > >>> >> +1 > >> >> >> > > >>> >> > >> >> >> > > >>> >> BatchGradientDescent for me :) > >> >> >> > > >>> >> > >> >> >> > > >>> >> > >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < > >> >> >> > > ktzou...@apache.org> > >> >> >> > > >>> >> wrote: > >> >> >> > > >>> >> > >> >> >> > > >>> >> > +1 > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > I go for WebLogAnalysis. > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > My experience with Scala consists of going through a > >> >> tutorial > >> >> >> so > >> >> >> > > this > >> >> >> > > >>> >> will > >> >> >> > > >>> >> > be a good stress test both for me and the new API :-) > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < > >> >> >> > > >>> >> > vasilikikala...@gmail.com> > >> >> >> > > >>> >> > wrote: > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > > +1 for having other people implement the examples! > >> >> >> > > >>> >> > > Connected Components and Kmeans for me :) > >> >> >> > > >>> >> > > > >> >> >> > > >>> >> > > -V. > >> >> >> > > >>> >> > > > >> >> >> > > >>> >> > > > >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske < > >> >> >> fhue...@apache.org> > >> >> >> > > >>> wrote: > >> >> >> > > >>> >> > > > >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank. > >> >> >> > > >>> >> > > > > >> >> >> > > >>> >> > > > Let's also do the examples similar to the Java > >> >> examples: > >> >> >> > > >>> >> > > > - running out-of-the-box without parameters > >> >> >> > > >>> >> > > > - parameters for external data > >> >> >> > > >>> >> > > > - follow a similar code structure > >> >> >> > > >>> >> > > > > >> >> >> > > >>> >> > > > > >> >> >> > > >>> >> > > > > >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < > >> >> >> > > aljos...@apache.org > >> >> >> > > >>> >: > >> >> >> > > >>> >> > > > > >> >> >> > > >>> >> > > > > Will do, then people can reserve their > favourite > >> >> >> examples > >> >> >> > > here. > >> >> >> > > >>> >> > > > > > >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < > >> >> >> > > >>> fhue...@apache.org> > >> >> >> > > >>> >> > > > wrote: > >> >> >> > > >>> >> > > > > > Hi, > >> >> >> > > >>> >> > > > > > > >> >> >> > > >>> >> > > > > > I think having examples implemented by > different > >> >> >> people > >> >> >> > > >>> proved to > >> >> >> > > >>> >> > be > >> >> >> > > >>> >> > > > > > valuable in the past. > >> >> >> > > >>> >> > > > > > I'd help with two or three examples. > >> >> >> > > >>> >> > > > > > > >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple > first > >> >> one > >> >> >> > such > >> >> >> > > as > >> >> >> > > >>> >> > > WordCount. > >> >> >> > > >>> >> > > > > > > >> >> >> > > >>> >> > > > > > Fabian > >> >> >> > > >>> >> > > > > > > >> >> >> > > >>> >> > > > > > > >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < > >> >> >> > > >>> aljos...@apache.org > >> >> >> > > >>> >> >: > >> >> >> > > >>> >> > > > > > > >> >> >> > > >>> >> > > > > >> Hi, > >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API > here: > >> >> >> > > >>> >> > > > > >> > >> >> >> > > >>> >> > >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework > >> >> >> > > >>> >> > > > > >> > >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write the > >> tests > >> >> and > >> >> >> > > port > >> >> >> > > >>> the > >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to let > >> other > >> >> >> > people > >> >> >> > > >>> port > >> >> >> > > >>> >> the > >> >> >> > > >>> >> > > > > >> examples, so that someone else uses it and > >> maybe > >> >> >> > notices > >> >> >> > > some > >> >> >> > > >>> >> > quirks > >> >> >> > > >>> >> > > > > >> in the API? > >> >> >> > > >>> >> > > > > >> > >> >> >> > > >>> >> > > > > >> Cheers, > >> >> >> > > >>> >> > > > > >> Aljoscha > >> >> >> > > >>> >> > > > > >> > >> >> >> > > >>> >> > > > > > >> >> >> > > >>> >> > > > > >> >> >> > > >>> >> > > > >> >> >> > > >>> >> > > >> >> >> > > >>> >> > >> >> >> > > >>> > >> >> >> > > > >> >> >> > > >> >> >> > >> >> > >> >