I added the ConnectedComponents Example from Vasia. Keep 'em coming, people. :D
On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <fhue...@apache.org> wrote: > Alright, will do. > Thanks! > > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>: > >> Ok people, executive decision. :D >> >> Please look at KMeansData.java and KMeans.scala. I'm storing the data >> in multi-dimensional object arrays and then converting it to the >> required Java or Scala objects. >> >> Also, I changed isEqualTo to equalTo to make it consistent with the Java >> API. >> >> Regarding Join (and coGroup). There is no need for a keyword, you can >> just write: >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le, re) } >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <fhue...@apache.org> wrote: >> > Aside from the DataSet issue, I also found an inconsistency with the Java >> > API. In Java join is done as: >> > >> > ds1.join(ds2).where(...).equalTo(...) >> > >> > where in the current Scala this is: >> > >> > ds1.join(d2).where(...).isEqualTo(...) >> > >> > isEqualTo() should be renamed to equalTo(), IMO. >> > Also, join (+cross and coGroup?) lacks the with() method because "with" >> is >> > a keyword in Scala. Should be offer something similar for Scala or go >> with >> > map() on Tuple2(left, right)? >> > >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <se...@apache.org>: >> > >> >> Instead of Strings, Object[][] would work as well. That is a generic >> >> representation of a Tuple. >> >> >> >> Alternatively, they could be stored as Java or Scala Tuples, with a >> generic >> >> utility method to convert between the two. >> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <fhue...@apache.org> >> wrote: >> >> >> >> > Yeah, I ran into the same problem... >> >> > >> >> > +1 for using Strings and parsing them, but using the CSVFormat won't >> >> work >> >> > because this is based on a FileInputFormat. >> >> > So we would need to parse the Strings manually... >> >> > >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>: >> >> > >> >> > > Hi, >> >> > > on second thought. Maybe we should just change all the example input >> >> > > data to strings and use CSV input formats in all the examples. What >> do >> >> > > you think? >> >> > > >> >> > > Cheers, >> >> > > Aljoscha >> >> > > >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek < >> aljos...@apache.org> >> >> > > wrote: >> >> > > > Hi, >> >> > > > yes it's unfortunate that the data types are incompatible. I'm >> afraid >> >> > > > you have to to what you proposed: move the data to a static field >> and >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala. It's >> not >> >> > > > nice, but copying would duplicate the data and make it easier for >> it >> >> > > > to go out of sync in the Java and Scala versions. >> >> > > > >> >> > > > What do the others think? This will probably occur in all the >> >> examples. >> >> > > > >> >> > > > Cheers, >> >> > > > Aljoscha >> >> > > > >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri >> >> > > > <vasilikikala...@gmail.com> wrote: >> >> > > >> Hey, >> >> > > >> >> >> > > >> I have ported the Connected Components example, but I am not sure >> >> how >> >> > to >> >> > > >> reuse the example input data from java-examples. >> >> > > >> In the ConnectedComponentsData class, the vertices and edges data >> >> are >> >> > > >> produced by the methods getDefaultVertexDataSet() >> >> > > >> and getDefaultEdgeDataSet(), which take >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as parameter. >> >> > > >> >> >> > > >> One way is to provide public static fields (like in the >> >> WordCountData >> >> > > >> class), but this introduces a conversion >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and >> from >> >> > > >> java.lang.Long to scala.Long and I guess this is an unnecessary >> >> > > complexity >> >> > > >> for an example (?). >> >> > > >> Another way is, of course, to copy the example data in the Scala >> >> > > example. >> >> > > >> >> >> > > >> Am I missing something here? >> >> > > >> >> >> > > >> Thanks! >> >> > > >> >> >> > > >> Cheers, >> >> > > >> V. >> >> > > >> >> >> > > >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <aljos...@apache.org >> > >> >> > > wrote: >> >> > > >> >> >> > > >>> Alright, I updated my repo: >> >> > > >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >> >> > > >>> >> >> > > >>> This now has a working WordCount example. It's pretty much a >> copy >> >> of >> >> > > >>> the Java example with some fixups for the syntax and lambda >> >> > functions. >> >> > > >>> You'll also notice that I added the java-examples as a >> dependency >> >> for >> >> > > >>> the scala-examples. I did this to reuse the example input data. >> >> > > >>> >> >> > > >>> When you ported a program you can do a pull request against my >> repo >> >> > > >>> and I will collect the examples. >> >> > > >>> >> >> > > >>> Happy coding. :D >> >> > > >>> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor < >> >> reckone...@gmail.com >> >> > > >> >> > > >>> wrote: >> >> > > >>> > +1 >> >> > > >>> > >> >> > > >>> > ComputeEdgeDegrees for me! >> >> > > >>> > >> >> > > >>> > >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < >> >> > > >>> balassi.mar...@gmail.com> >> >> > > >>> > wrote: >> >> > > >>> > >> >> > > >>> >> +1 >> >> > > >>> >> >> >> > > >>> >> BatchGradientDescent for me :) >> >> > > >>> >> >> >> > > >>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < >> >> > > ktzou...@apache.org> >> >> > > >>> >> wrote: >> >> > > >>> >> >> >> > > >>> >> > +1 >> >> > > >>> >> > >> >> > > >>> >> > I go for WebLogAnalysis. >> >> > > >>> >> > >> >> > > >>> >> > My experience with Scala consists of going through a >> tutorial >> >> so >> >> > > this >> >> > > >>> >> will >> >> > > >>> >> > be a good stress test both for me and the new API :-) >> >> > > >>> >> > >> >> > > >>> >> > >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < >> >> > > >>> >> > vasilikikala...@gmail.com> >> >> > > >>> >> > wrote: >> >> > > >>> >> > >> >> > > >>> >> > > +1 for having other people implement the examples! >> >> > > >>> >> > > Connected Components and Kmeans for me :) >> >> > > >>> >> > > >> >> > > >>> >> > > -V. >> >> > > >>> >> > > >> >> > > >>> >> > > >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske < >> >> fhue...@apache.org> >> >> > > >>> wrote: >> >> > > >>> >> > > >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank. >> >> > > >>> >> > > > >> >> > > >>> >> > > > Let's also do the examples similar to the Java >> examples: >> >> > > >>> >> > > > - running out-of-the-box without parameters >> >> > > >>> >> > > > - parameters for external data >> >> > > >>> >> > > > - follow a similar code structure >> >> > > >>> >> > > > >> >> > > >>> >> > > > >> >> > > >>> >> > > > >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < >> >> > > aljos...@apache.org >> >> > > >>> >: >> >> > > >>> >> > > > >> >> > > >>> >> > > > > Will do, then people can reserve their favourite >> >> examples >> >> > > here. >> >> > > >>> >> > > > > >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < >> >> > > >>> fhue...@apache.org> >> >> > > >>> >> > > > wrote: >> >> > > >>> >> > > > > > Hi, >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > > I think having examples implemented by different >> >> people >> >> > > >>> proved to >> >> > > >>> >> > be >> >> > > >>> >> > > > > > valuable in the past. >> >> > > >>> >> > > > > > I'd help with two or three examples. >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple first >> one >> >> > such >> >> > > as >> >> > > >>> >> > > WordCount. >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > > Fabian >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < >> >> > > >>> aljos...@apache.org >> >> > > >>> >> >: >> >> > > >>> >> > > > > > >> >> > > >>> >> > > > > >> Hi, >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API here: >> >> > > >>> >> > > > > >> >> >> > > >>> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >> >> > > >>> >> > > > > >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write the tests >> and >> >> > > port >> >> > > >>> the >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to let other >> >> > people >> >> > > >>> port >> >> > > >>> >> the >> >> > > >>> >> > > > > >> examples, so that someone else uses it and maybe >> >> > notices >> >> > > some >> >> > > >>> >> > quirks >> >> > > >>> >> > > > > >> in the API? >> >> > > >>> >> > > > > >> >> >> > > >>> >> > > > > >> Cheers, >> >> > > >>> >> > > > > >> Aljoscha >> >> > > >>> >> > > > > >> >> >> > > >>> >> > > > > >> >> > > >>> >> > > > >> >> > > >>> >> > > >> >> > > >>> >> > >> >> > > >>> >> >> >> > > >>> >> >> > > >> >> > >> >> >>