I added the PageRank example, thanks again fabian. :D Regarding the other stuff: - There is a comment in DataSet.scala about including org.apache.flink.api.scala._ because of the TypeInformation. - I added generateSequence to ExecutionEnvironment. - It is possible to use Scala Primitives in Array, I noticed it while writing the tests, you probably had an older version of the code. - Yes, using List and other Interfaces is not possible, this is also a restriction in the Java API.
What do you think about the interface of join and coGroup? Right now, you can either use a lambda that returns an Option or the lambda with the Collector. Originally I wanted to have also have a lambda that returns a Collection, but due to type erasure this has the same type as the lambda with the Option so I couldn't use it. There is an implicit conversion from Option to a Collection, so I could change it without breaking the examples we have now. What do you think? So far we have ported: WordCount, KMeans, ConnectedComponents, WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt, PageRank These are the examples people called dibs on: - BatchGradientDescent (Márton) (Should be a port of LinearRegression Example from Java) - ComputeEdgeDegrees (Hermann) Those are unclaimed (if I'm not mistaken): - The relational Stuff On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <[email protected]> wrote: > +1 for removing RelationQuery > > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <[email protected]> > wrote: > >> By the way, what was called BatchGradientDescent in the Scala examples >> should be replaced by a port of the LinearRegression Example from >> Java. I had them as two separate examples earlier. >> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about removing >> RelationalQuery? >> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <[email protected]> >> wrote: >> > I added the Triangle Enumeration Examples, thanks Fabian. >> > >> > So far we have ported: WordCount, KMeans, ConnectedComponents, >> > WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt >> > >> > These are the examples people called dibs on: >> > - PageRank (Fabian) >> > - BatchGradientDescent (Márton) >> > - ComputeEdgeDegrees (Hermann) >> > >> > Those are unclaimed (if I'm not mistaken): >> > - The relational Stuff >> > - LinearRegression >> > >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <[email protected]> >> wrote: >> >> Thanks, I added it. I'll keep a running list of ported/unported >> >> examples in my mails. I'll rename the java example package to examples >> >> once the Scala API merge is done. >> >> >> >> I think the termination criterion is fine as it is. Just because Scala >> >> enables functional programming doesn't mean it's always the best >> >> choice. :D >> >> >> >> So far we have ported: WordCount, KMeans, ConnectedComponents, >> >> WebLogAnalysis, TransitiveClosureNaive >> >> >> >> These are the examples people called dibs on: >> >> - TriangleEnumration and PageRank (Fabian) >> >> - BatchGradientDescent (Márton) >> >> - ComputeEdgeDegrees (Hermann) >> >> >> >> Those are unclaimed (if I'm not mistaken): >> >> - The relational Stuff >> >> - LinearRegression >> >> >> >> Cheers, >> >> Aljoscha >> >> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <[email protected]> >> wrote: >> >>> Transitive closure here, I also added a termination criterion in the >> Java >> >>> version: >> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example >> >>> >> >>> Perhaps you can make the termination criterion in Scala more >> functional? >> >>> >> >>> I noticed that the examples package name is example.java but >> examples.scala >> >>> >> >>> Kostas >> >>> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <[email protected]> >> wrote: >> >>>> >> >>>> I'll take TransitiveClosure and PiEstimation (was not on your list). >> >>>> >> >>>> If nobody volunteers for the relational stuff I can take those as >> well. >> >>>> >> >>>> How about removing the "RelationalQuery" from both Scala and Java? It >> >>>> seems to be a proper subset of TPC-H Q3. Does it add some teaching >> value on >> >>>> top of TPC-H Q3? >> >>>> >> >>>> Kostas >> >>>> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <[email protected] >> > >> >>>> wrote: >> >>>>> >> >>>>> Thanks, I added it, along with an ITCase. >> >>>>> >> >>>>> So far we have ported: WordCount, KMeans, ConnectedComponents, >> >>>>> WebLogAnalysis >> >>>>> >> >>>>> These are the examples people called dibs on: >> >>>>> - TriangleEnumration and PageRank (Fabian) >> >>>>> - BatchGradientDescent (Márton) >> >>>>> - ComputeEdgeDegrees (Hermann) >> >>>>> >> >>>>> Those are unclaimed (if I'm not mistaken): >> >>>>> - TransitiveClosure >> >>>>> - The relational Stuff >> >>>>> - LinearRegression >> >>>>> >> >>>>> Cheers, >> >>>>> Aljoscha >> >>>>> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <[email protected]> >> >>>>> wrote: >> >>>>> > WebLog here: >> >>>>> > >> >>>>> > >> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala >> >>>>> > >> >>>>> > Do you need any more done? >> >>>>> > >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek < >> [email protected]> >> >>>>> > wrote: >> >>>>> > >> >>>>> >> I added the ConnectedComponents Example from Vasia. >> >>>>> >> >> >>>>> >> Keep 'em coming, people. :D >> >>>>> >> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <[email protected] >> > >> >>>>> >> wrote: >> >>>>> >> > Alright, will do. >> >>>>> >> > Thanks! >> >>>>> >> > >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek < >> [email protected]>: >> >>>>> >> > >> >>>>> >> >> Ok people, executive decision. :D >> >>>>> >> >> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing >> the >> >>>>> >> >> data >> >>>>> >> >> in multi-dimensional object arrays and then converting it to >> the >> >>>>> >> >> required Java or Scala objects. >> >>>>> >> >> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent >> with the >> >>>>> >> >> Java >> >>>>> >> >> API. >> >>>>> >> >> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a keyword, >> you >> >>>>> >> >> can >> >>>>> >> >> just write: >> >>>>> >> >> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new >> MyResult(le, >> >>>>> >> >> re) >> >>>>> >> } >> >>>>> >> >> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske < >> [email protected]> >> >>>>> >> wrote: >> >>>>> >> >> > Aside from the DataSet issue, I also found an inconsistency >> with >> >>>>> >> >> > the >> >>>>> >> Java >> >>>>> >> >> > API. In Java join is done as: >> >>>>> >> >> > >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...) >> >>>>> >> >> > >> >>>>> >> >> > where in the current Scala this is: >> >>>>> >> >> > >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...) >> >>>>> >> >> > >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO. >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method >> because >> >>>>> >> "with" >> >>>>> >> >> is >> >>>>> >> >> > a keyword in Scala. Should be offer something similar for >> Scala >> >>>>> >> >> > or go >> >>>>> >> >> with >> >>>>> >> >> > map() on Tuple2(left, right)? >> >>>>> >> >> > >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[email protected]>: >> >>>>> >> >> > >> >>>>> >> >> >> Instead of Strings, Object[][] would work as well. That is a >> >>>>> >> >> >> generic >> >>>>> >> >> >> representation of a Tuple. >> >>>>> >> >> >> >> >>>>> >> >> >> Alternatively, they could be stored as Java or Scala Tuples, >> >>>>> >> >> >> with a >> >>>>> >> >> generic >> >>>>> >> >> >> utility method to convert between the two. >> >>>>> >> >> >> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske >> >>>>> >> >> >> <[email protected]> >> >>>>> >> >> wrote: >> >>>>> >> >> >> >> >>>>> >> >> >> > Yeah, I ran into the same problem... >> >>>>> >> >> >> > >> >>>>> >> >> >> > +1 for using Strings and parsing them, but using the >> >>>>> >> >> >> > CSVFormat >> >>>>> >> won't >> >>>>> >> >> >> work >> >>>>> >> >> >> > because this is based on a FileInputFormat. >> >>>>> >> >> >> > So we would need to parse the Strings manually... >> >>>>> >> >> >> > >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek >> >>>>> >> >> >> > <[email protected]>: >> >>>>> >> >> >> > >> >>>>> >> >> >> > > Hi, >> >>>>> >> >> >> > > on second thought. Maybe we should just change all the >> >>>>> >> >> >> > > example >> >>>>> >> input >> >>>>> >> >> >> > > data to strings and use CSV input formats in all the >> >>>>> >> >> >> > > examples. >> >>>>> >> What >> >>>>> >> >> do >> >>>>> >> >> >> > > you think? >> >>>>> >> >> >> > > >> >>>>> >> >> >> > > Cheers, >> >>>>> >> >> >> > > Aljoscha >> >>>>> >> >> >> > > >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek < >> >>>>> >> >> [email protected]> >> >>>>> >> >> >> > > wrote: >> >>>>> >> >> >> > > > Hi, >> >>>>> >> >> >> > > > yes it's unfortunate that the data types are >> incompatible. >> >>>>> >> >> >> > > > I'm >> >>>>> >> >> afraid >> >>>>> >> >> >> > > > you have to to what you proposed: move the data to a >> >>>>> >> >> >> > > > static >> >>>>> >> field >> >>>>> >> >> and >> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in >> Scala. >> >>>>> >> >> >> > > > It's >> >>>>> >> >> not >> >>>>> >> >> >> > > > nice, but copying would duplicate the data and make it >> >>>>> >> >> >> > > > easier >> >>>>> >> for >> >>>>> >> >> it >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala versions. >> >>>>> >> >> >> > > > >> >>>>> >> >> >> > > > What do the others think? This will probably occur in >> all >> >>>>> >> >> >> > > > the >> >>>>> >> >> >> examples. >> >>>>> >> >> >> > > > >> >>>>> >> >> >> > > > Cheers, >> >>>>> >> >> >> > > > Aljoscha >> >>>>> >> >> >> > > > >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri >> >>>>> >> >> >> > > > <[email protected]> wrote: >> >>>>> >> >> >> > > >> Hey, >> >>>>> >> >> >> > > >> >> >>>>> >> >> >> > > >> I have ported the Connected Components example, but >> I am >> >>>>> >> >> >> > > >> not >> >>>>> >> sure >> >>>>> >> >> >> how >> >>>>> >> >> >> > to >> >>>>> >> >> >> > > >> reuse the example input data from java-examples. >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices >> and >> >>>>> >> >> >> > > >> edges >> >>>>> >> data >> >>>>> >> >> >> are >> >>>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet() >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take >> >>>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as >> >>>>> >> parameter. >> >>>>> >> >> >> > > >> >> >>>>> >> >> >> > > >> One way is to provide public static fields (like in >> the >> >>>>> >> >> >> WordCountData >> >>>>> >> >> >> > > >> class), but this introduces a conversion >> >>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala >> >>>>> >> >> >> > > >> tuple and >> >>>>> >> >> from >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an >> >>>>> >> unnecessary >> >>>>> >> >> >> > > complexity >> >>>>> >> >> >> > > >> for an example (?). >> >>>>> >> >> >> > > >> Another way is, of course, to copy the example data >> in >> >>>>> >> >> >> > > >> the >> >>>>> >> Scala >> >>>>> >> >> >> > > example. >> >>>>> >> >> >> > > >> >> >>>>> >> >> >> > > >> Am I missing something here? >> >>>>> >> >> >> > > >> >> >>>>> >> >> >> > > >> Thanks! >> >>>>> >> >> >> > > >> >> >>>>> >> >> >> > > >> Cheers, >> >>>>> >> >> >> > > >> V. >> >>>>> >> >> >> > > >> >> >>>>> >> >> >> > > >> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek < >> >>>>> >> [email protected] >> >>>>> >> >> > >> >>>>> >> >> >> > > wrote: >> >>>>> >> >> >> > > >> >> >>>>> >> >> >> > > >>> Alright, I updated my repo: >> >>>>> >> >> >> > > >>> >> >>>>> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >> >>>>> >> >> >> > > >>> >> >>>>> >> >> >> > > >>> This now has a working WordCount example. It's >> pretty >> >>>>> >> >> >> > > >>> much a >> >>>>> >> >> copy >> >>>>> >> >> >> of >> >>>>> >> >> >> > > >>> the Java example with some fixups for the syntax and >> >>>>> >> >> >> > > >>> lambda >> >>>>> >> >> >> > functions. >> >>>>> >> >> >> > > >>> You'll also notice that I added the java-examples >> as a >> >>>>> >> >> dependency >> >>>>> >> >> >> for >> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the example >> >>>>> >> >> >> > > >>> input >> >>>>> >> data. >> >>>>> >> >> >> > > >>> >> >>>>> >> >> >> > > >>> When you ported a program you can do a pull request >> >>>>> >> >> >> > > >>> against >> >>>>> >> my >> >>>>> >> >> repo >> >>>>> >> >> >> > > >>> and I will collect the examples. >> >>>>> >> >> >> > > >>> >> >>>>> >> >> >> > > >>> Happy coding. :D >> >>>>> >> >> >> > > >>> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor < >> >>>>> >> >> >> [email protected] >> >>>>> >> >> >> > > >> >>>>> >> >> >> > > >>> wrote: >> >>>>> >> >> >> > > >>> > +1 >> >>>>> >> >> >> > > >>> > >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me! >> >>>>> >> >> >> > > >>> > >> >>>>> >> >> >> > > >>> > >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < >> >>>>> >> >> >> > > >>> [email protected]> >> >>>>> >> >> >> > > >>> > wrote: >> >>>>> >> >> >> > > >>> > >> >>>>> >> >> >> > > >>> >> +1 >> >>>>> >> >> >> > > >>> >> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :) >> >>>>> >> >> >> > > >>> >> >> >>>>> >> >> >> > > >>> >> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < >> >>>>> >> >> >> > > [email protected]> >> >>>>> >> >> >> > > >>> >> wrote: >> >>>>> >> >> >> > > >>> >> >> >>>>> >> >> >> > > >>> >> > +1 >> >>>>> >> >> >> > > >>> >> > >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis. >> >>>>> >> >> >> > > >>> >> > >> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of going >> through >> >>>>> >> >> >> > > >>> >> > a >> >>>>> >> >> tutorial >> >>>>> >> >> >> so >> >>>>> >> >> >> > > this >> >>>>> >> >> >> > > >>> >> will >> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and the new >> API >> >>>>> >> >> >> > > >>> >> > :-) >> >>>>> >> >> >> > > >>> >> > >> >>>>> >> >> >> > > >>> >> > >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki >> Kalavri < >> >>>>> >> >> >> > > >>> >> > [email protected]> >> >>>>> >> >> >> > > >>> >> > wrote: >> >>>>> >> >> >> > > >>> >> > >> >>>>> >> >> >> > > >>> >> > > +1 for having other people implement the >> >>>>> >> >> >> > > >>> >> > > examples! >> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :) >> >>>>> >> >> >> > > >>> >> > > >> >>>>> >> >> >> > > >>> >> > > -V. >> >>>>> >> >> >> > > >>> >> > > >> >>>>> >> >> >> > > >>> >> > > >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske < >> >>>>> >> >> >> [email protected]> >> >>>>> >> >> >> > > >>> wrote: >> >>>>> >> >> >> > > >>> >> > > >> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank. >> >>>>> >> >> >> > > >>> >> > > > >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to the >> Java >> >>>>> >> >> examples: >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without parameters >> >>>>> >> >> >> > > >>> >> > > > - parameters for external data >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure >> >>>>> >> >> >> > > >>> >> > > > >> >>>>> >> >> >> > > >>> >> > > > >> >>>>> >> >> >> > > >>> >> > > > >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha >> Krettek < >> >>>>> >> >> >> > > [email protected] >> >>>>> >> >> >> > > >>> >: >> >>>>> >> >> >> > > >>> >> > > > >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their >> >>>>> >> >> >> > > >>> >> > > > > favourite >> >>>>> >> >> >> examples >> >>>>> >> >> >> > > here. >> >>>>> >> >> >> > > >>> >> > > > > >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian >> Hueske >> >>>>> >> >> >> > > >>> >> > > > > < >> >>>>> >> >> >> > > >>> [email protected]> >> >>>>> >> >> >> > > >>> >> > > > wrote: >> >>>>> >> >> >> > > >>> >> > > > > > Hi, >> >>>>> >> >> >> > > >>> >> > > > > > >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples implemented by >> >>>>> >> >> >> > > >>> >> > > > > > different >> >>>>> >> >> >> people >> >>>>> >> >> >> > > >>> proved to >> >>>>> >> >> >> > > >>> >> > be >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past. >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples. >> >>>>> >> >> >> > > >>> >> > > > > > >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a >> simple >> >>>>> >> >> >> > > >>> >> > > > > > first >> >>>>> >> >> one >> >>>>> >> >> >> > such >> >>>>> >> >> >> > > as >> >>>>> >> >> >> > > >>> >> > > WordCount. >> >>>>> >> >> >> > > >>> >> > > > > > >> >>>>> >> >> >> > > >>> >> > > > > > Fabian >> >>>>> >> >> >> > > >>> >> > > > > > >> >>>>> >> >> >> > > >>> >> > > > > > >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha >> Krettek >> >>>>> >> >> >> > > >>> >> > > > > > < >> >>>>> >> >> >> > > >>> [email protected] >> >>>>> >> >> >> > > >>> >> >: >> >>>>> >> >> >> > > >>> >> > > > > > >> >>>>> >> >> >> > > >>> >> > > > > >> Hi, >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala >> API >> >>>>> >> >> >> > > >>> >> > > > > >> here: >> >>>>> >> >> >> > > >>> >> > > > > >> >> >>>>> >> >> >> > > >>> >> >> >>>>> >> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >> >>>>> >> >> >> > > >>> >> > > > > >> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to >> write >> >>>>> >> >> >> > > >>> >> > > > > >> the >> >>>>> >> tests >> >>>>> >> >> and >> >>>>> >> >> >> > > port >> >>>>> >> >> >> > > >>> the >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense >> to >> >>>>> >> >> >> > > >>> >> > > > > >> let >> >>>>> >> other >> >>>>> >> >> >> > people >> >>>>> >> >> >> > > >>> port >> >>>>> >> >> >> > > >>> >> the >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses >> it and >> >>>>> >> maybe >> >>>>> >> >> >> > notices >> >>>>> >> >> >> > > some >> >>>>> >> >> >> > > >>> >> > quirks >> >>>>> >> >> >> > > >>> >> > > > > >> in the API? >> >>>>> >> >> >> > > >>> >> > > > > >> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers, >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha >> >>>>> >> >> >> > > >>> >> > > > > >> >> >>>>> >> >> >> > > >>> >> > > > > >> >>>>> >> >> >> > > >>> >> > > > >> >>>>> >> >> >> > > >>> >> > > >> >>>>> >> >> >> > > >>> >> > >> >>>>> >> >> >> > > >>> >> >> >>>>> >> >> >> > > >>> >> >>>>> >> >> >> > > >> >>>>> >> >> >> > >> >>>>> >> >> >> >> >>>>> >> >> >> >>>>> >> >> >>>> >> >>>> >> >>> >>
