By the way, what was called BatchGradientDescent in the Scala examples should be replaced by a port of the LinearRegression Example from Java. I had them as two separate examples earlier.
What about RelationalQuery and TPC-H-Q3. Any thoughts about removing RelationalQuery? On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > I added the Triangle Enumeration Examples, thanks Fabian. > > So far we have ported: WordCount, KMeans, ConnectedComponents, > WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt > > These are the examples people called dibs on: > - PageRank (Fabian) > - BatchGradientDescent (Márton) > - ComputeEdgeDegrees (Hermann) > > Those are unclaimed (if I'm not mistaken): > - The relational Stuff > - LinearRegression > > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <aljos...@apache.org> wrote: >> Thanks, I added it. I'll keep a running list of ported/unported >> examples in my mails. I'll rename the java example package to examples >> once the Scala API merge is done. >> >> I think the termination criterion is fine as it is. Just because Scala >> enables functional programming doesn't mean it's always the best >> choice. :D >> >> So far we have ported: WordCount, KMeans, ConnectedComponents, >> WebLogAnalysis, TransitiveClosureNaive >> >> These are the examples people called dibs on: >> - TriangleEnumration and PageRank (Fabian) >> - BatchGradientDescent (Márton) >> - ComputeEdgeDegrees (Hermann) >> >> Those are unclaimed (if I'm not mistaken): >> - The relational Stuff >> - LinearRegression >> >> Cheers, >> Aljoscha >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <ktzou...@apache.org> wrote: >>> Transitive closure here, I also added a termination criterion in the Java >>> version: https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example >>> >>> Perhaps you can make the termination criterion in Scala more functional? >>> >>> I noticed that the examples package name is example.java but examples.scala >>> >>> Kostas >>> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <ktzou...@apache.org> wrote: >>>> >>>> I'll take TransitiveClosure and PiEstimation (was not on your list). >>>> >>>> If nobody volunteers for the relational stuff I can take those as well. >>>> >>>> How about removing the "RelationalQuery" from both Scala and Java? It >>>> seems to be a proper subset of TPC-H Q3. Does it add some teaching value on >>>> top of TPC-H Q3? >>>> >>>> Kostas >>>> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <aljos...@apache.org> >>>> wrote: >>>>> >>>>> Thanks, I added it, along with an ITCase. >>>>> >>>>> So far we have ported: WordCount, KMeans, ConnectedComponents, >>>>> WebLogAnalysis >>>>> >>>>> These are the examples people called dibs on: >>>>> - TriangleEnumration and PageRank (Fabian) >>>>> - BatchGradientDescent (Márton) >>>>> - ComputeEdgeDegrees (Hermann) >>>>> >>>>> Those are unclaimed (if I'm not mistaken): >>>>> - TransitiveClosure >>>>> - The relational Stuff >>>>> - LinearRegression >>>>> >>>>> Cheers, >>>>> Aljoscha >>>>> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <ktzou...@apache.org> >>>>> wrote: >>>>> > WebLog here: >>>>> > >>>>> > https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala >>>>> > >>>>> > Do you need any more done? >>>>> > >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <aljos...@apache.org> >>>>> > wrote: >>>>> > >>>>> >> I added the ConnectedComponents Example from Vasia. >>>>> >> >>>>> >> Keep 'em coming, people. :D >>>>> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <fhue...@apache.org> >>>>> >> wrote: >>>>> >> > Alright, will do. >>>>> >> > Thanks! >>>>> >> > >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>: >>>>> >> > >>>>> >> >> Ok people, executive decision. :D >>>>> >> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing the >>>>> >> >> data >>>>> >> >> in multi-dimensional object arrays and then converting it to the >>>>> >> >> required Java or Scala objects. >>>>> >> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent with the >>>>> >> >> Java >>>>> >> >> API. >>>>> >> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a keyword, you >>>>> >> >> can >>>>> >> >> just write: >>>>> >> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le, >>>>> >> >> re) >>>>> >> } >>>>> >> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <fhue...@apache.org> >>>>> >> wrote: >>>>> >> >> > Aside from the DataSet issue, I also found an inconsistency with >>>>> >> >> > the >>>>> >> Java >>>>> >> >> > API. In Java join is done as: >>>>> >> >> > >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...) >>>>> >> >> > >>>>> >> >> > where in the current Scala this is: >>>>> >> >> > >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...) >>>>> >> >> > >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO. >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method because >>>>> >> "with" >>>>> >> >> is >>>>> >> >> > a keyword in Scala. Should be offer something similar for Scala >>>>> >> >> > or go >>>>> >> >> with >>>>> >> >> > map() on Tuple2(left, right)? >>>>> >> >> > >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <se...@apache.org>: >>>>> >> >> > >>>>> >> >> >> Instead of Strings, Object[][] would work as well. That is a >>>>> >> >> >> generic >>>>> >> >> >> representation of a Tuple. >>>>> >> >> >> >>>>> >> >> >> Alternatively, they could be stored as Java or Scala Tuples, >>>>> >> >> >> with a >>>>> >> >> generic >>>>> >> >> >> utility method to convert between the two. >>>>> >> >> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske >>>>> >> >> >> <fhue...@apache.org> >>>>> >> >> wrote: >>>>> >> >> >> >>>>> >> >> >> > Yeah, I ran into the same problem... >>>>> >> >> >> > >>>>> >> >> >> > +1 for using Strings and parsing them, but using the >>>>> >> >> >> > CSVFormat >>>>> >> won't >>>>> >> >> >> work >>>>> >> >> >> > because this is based on a FileInputFormat. >>>>> >> >> >> > So we would need to parse the Strings manually... >>>>> >> >> >> > >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek >>>>> >> >> >> > <aljos...@apache.org>: >>>>> >> >> >> > >>>>> >> >> >> > > Hi, >>>>> >> >> >> > > on second thought. Maybe we should just change all the >>>>> >> >> >> > > example >>>>> >> input >>>>> >> >> >> > > data to strings and use CSV input formats in all the >>>>> >> >> >> > > examples. >>>>> >> What >>>>> >> >> do >>>>> >> >> >> > > you think? >>>>> >> >> >> > > >>>>> >> >> >> > > Cheers, >>>>> >> >> >> > > Aljoscha >>>>> >> >> >> > > >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek < >>>>> >> >> aljos...@apache.org> >>>>> >> >> >> > > wrote: >>>>> >> >> >> > > > Hi, >>>>> >> >> >> > > > yes it's unfortunate that the data types are incompatible. >>>>> >> >> >> > > > I'm >>>>> >> >> afraid >>>>> >> >> >> > > > you have to to what you proposed: move the data to a >>>>> >> >> >> > > > static >>>>> >> field >>>>> >> >> and >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala. >>>>> >> >> >> > > > It's >>>>> >> >> not >>>>> >> >> >> > > > nice, but copying would duplicate the data and make it >>>>> >> >> >> > > > easier >>>>> >> for >>>>> >> >> it >>>>> >> >> >> > > > to go out of sync in the Java and Scala versions. >>>>> >> >> >> > > > >>>>> >> >> >> > > > What do the others think? This will probably occur in all >>>>> >> >> >> > > > the >>>>> >> >> >> examples. >>>>> >> >> >> > > > >>>>> >> >> >> > > > Cheers, >>>>> >> >> >> > > > Aljoscha >>>>> >> >> >> > > > >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri >>>>> >> >> >> > > > <vasilikikala...@gmail.com> wrote: >>>>> >> >> >> > > >> Hey, >>>>> >> >> >> > > >> >>>>> >> >> >> > > >> I have ported the Connected Components example, but I am >>>>> >> >> >> > > >> not >>>>> >> sure >>>>> >> >> >> how >>>>> >> >> >> > to >>>>> >> >> >> > > >> reuse the example input data from java-examples. >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices and >>>>> >> >> >> > > >> edges >>>>> >> data >>>>> >> >> >> are >>>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet() >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take >>>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as >>>>> >> parameter. >>>>> >> >> >> > > >> >>>>> >> >> >> > > >> One way is to provide public static fields (like in the >>>>> >> >> >> WordCountData >>>>> >> >> >> > > >> class), but this introduces a conversion >>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala >>>>> >> >> >> > > >> tuple and >>>>> >> >> from >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an >>>>> >> unnecessary >>>>> >> >> >> > > complexity >>>>> >> >> >> > > >> for an example (?). >>>>> >> >> >> > > >> Another way is, of course, to copy the example data in >>>>> >> >> >> > > >> the >>>>> >> Scala >>>>> >> >> >> > > example. >>>>> >> >> >> > > >> >>>>> >> >> >> > > >> Am I missing something here? >>>>> >> >> >> > > >> >>>>> >> >> >> > > >> Thanks! >>>>> >> >> >> > > >> >>>>> >> >> >> > > >> Cheers, >>>>> >> >> >> > > >> V. >>>>> >> >> >> > > >> >>>>> >> >> >> > > >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek < >>>>> >> aljos...@apache.org >>>>> >> >> > >>>>> >> >> >> > > wrote: >>>>> >> >> >> > > >> >>>>> >> >> >> > > >>> Alright, I updated my repo: >>>>> >> >> >> > > >>> >>>>> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >>>>> >> >> >> > > >>> >>>>> >> >> >> > > >>> This now has a working WordCount example. It's pretty >>>>> >> >> >> > > >>> much a >>>>> >> >> copy >>>>> >> >> >> of >>>>> >> >> >> > > >>> the Java example with some fixups for the syntax and >>>>> >> >> >> > > >>> lambda >>>>> >> >> >> > functions. >>>>> >> >> >> > > >>> You'll also notice that I added the java-examples as a >>>>> >> >> dependency >>>>> >> >> >> for >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the example >>>>> >> >> >> > > >>> input >>>>> >> data. >>>>> >> >> >> > > >>> >>>>> >> >> >> > > >>> When you ported a program you can do a pull request >>>>> >> >> >> > > >>> against >>>>> >> my >>>>> >> >> repo >>>>> >> >> >> > > >>> and I will collect the examples. >>>>> >> >> >> > > >>> >>>>> >> >> >> > > >>> Happy coding. :D >>>>> >> >> >> > > >>> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor < >>>>> >> >> >> reckone...@gmail.com >>>>> >> >> >> > > >>>>> >> >> >> > > >>> wrote: >>>>> >> >> >> > > >>> > +1 >>>>> >> >> >> > > >>> > >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me! >>>>> >> >> >> > > >>> > >>>>> >> >> >> > > >>> > >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < >>>>> >> >> >> > > >>> balassi.mar...@gmail.com> >>>>> >> >> >> > > >>> > wrote: >>>>> >> >> >> > > >>> > >>>>> >> >> >> > > >>> >> +1 >>>>> >> >> >> > > >>> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :) >>>>> >> >> >> > > >>> >> >>>>> >> >> >> > > >>> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < >>>>> >> >> >> > > ktzou...@apache.org> >>>>> >> >> >> > > >>> >> wrote: >>>>> >> >> >> > > >>> >> >>>>> >> >> >> > > >>> >> > +1 >>>>> >> >> >> > > >>> >> > >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis. >>>>> >> >> >> > > >>> >> > >>>>> >> >> >> > > >>> >> > My experience with Scala consists of going through >>>>> >> >> >> > > >>> >> > a >>>>> >> >> tutorial >>>>> >> >> >> so >>>>> >> >> >> > > this >>>>> >> >> >> > > >>> >> will >>>>> >> >> >> > > >>> >> > be a good stress test both for me and the new API >>>>> >> >> >> > > >>> >> > :-) >>>>> >> >> >> > > >>> >> > >>>>> >> >> >> > > >>> >> > >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < >>>>> >> >> >> > > >>> >> > vasilikikala...@gmail.com> >>>>> >> >> >> > > >>> >> > wrote: >>>>> >> >> >> > > >>> >> > >>>>> >> >> >> > > >>> >> > > +1 for having other people implement the >>>>> >> >> >> > > >>> >> > > examples! >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :) >>>>> >> >> >> > > >>> >> > > >>>>> >> >> >> > > >>> >> > > -V. >>>>> >> >> >> > > >>> >> > > >>>>> >> >> >> > > >>> >> > > >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske < >>>>> >> >> >> fhue...@apache.org> >>>>> >> >> >> > > >>> wrote: >>>>> >> >> >> > > >>> >> > > >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank. >>>>> >> >> >> > > >>> >> > > > >>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to the Java >>>>> >> >> examples: >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without parameters >>>>> >> >> >> > > >>> >> > > > - parameters for external data >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure >>>>> >> >> >> > > >>> >> > > > >>>>> >> >> >> > > >>> >> > > > >>>>> >> >> >> > > >>> >> > > > >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < >>>>> >> >> >> > > aljos...@apache.org >>>>> >> >> >> > > >>> >: >>>>> >> >> >> > > >>> >> > > > >>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their >>>>> >> >> >> > > >>> >> > > > > favourite >>>>> >> >> >> examples >>>>> >> >> >> > > here. >>>>> >> >> >> > > >>> >> > > > > >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske >>>>> >> >> >> > > >>> >> > > > > < >>>>> >> >> >> > > >>> fhue...@apache.org> >>>>> >> >> >> > > >>> >> > > > wrote: >>>>> >> >> >> > > >>> >> > > > > > Hi, >>>>> >> >> >> > > >>> >> > > > > > >>>>> >> >> >> > > >>> >> > > > > > I think having examples implemented by >>>>> >> >> >> > > >>> >> > > > > > different >>>>> >> >> >> people >>>>> >> >> >> > > >>> proved to >>>>> >> >> >> > > >>> >> > be >>>>> >> >> >> > > >>> >> > > > > > valuable in the past. >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples. >>>>> >> >> >> > > >>> >> > > > > > >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple >>>>> >> >> >> > > >>> >> > > > > > first >>>>> >> >> one >>>>> >> >> >> > such >>>>> >> >> >> > > as >>>>> >> >> >> > > >>> >> > > WordCount. >>>>> >> >> >> > > >>> >> > > > > > >>>>> >> >> >> > > >>> >> > > > > > Fabian >>>>> >> >> >> > > >>> >> > > > > > >>>>> >> >> >> > > >>> >> > > > > > >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek >>>>> >> >> >> > > >>> >> > > > > > < >>>>> >> >> >> > > >>> aljos...@apache.org >>>>> >> >> >> > > >>> >> >: >>>>> >> >> >> > > >>> >> > > > > > >>>>> >> >> >> > > >>> >> > > > > >> Hi, >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API >>>>> >> >> >> > > >>> >> > > > > >> here: >>>>> >> >> >> > > >>> >> > > > > >> >>>>> >> >> >> > > >>> >> >>>>> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework >>>>> >> >> >> > > >>> >> > > > > >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write >>>>> >> >> >> > > >>> >> > > > > >> the >>>>> >> tests >>>>> >> >> and >>>>> >> >> >> > > port >>>>> >> >> >> > > >>> the >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to >>>>> >> >> >> > > >>> >> > > > > >> let >>>>> >> other >>>>> >> >> >> > people >>>>> >> >> >> > > >>> port >>>>> >> >> >> > > >>> >> the >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses it and >>>>> >> maybe >>>>> >> >> >> > notices >>>>> >> >> >> > > some >>>>> >> >> >> > > >>> >> > quirks >>>>> >> >> >> > > >>> >> > > > > >> in the API? >>>>> >> >> >> > > >>> >> > > > > >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers, >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha >>>>> >> >> >> > > >>> >> > > > > >> >>>>> >> >> >> > > >>> >> > > > > >>>>> >> >> >> > > >>> >> > > > >>>>> >> >> >> > > >>> >> > > >>>>> >> >> >> > > >>> >> > >>>>> >> >> >> > > >>> >> >>>>> >> >> >> > > >>> >>>>> >> >> >> > > >>>>> >> >> >> > >>>>> >> >> >> >>>>> >> >> >>>>> >> >>>> >>>> >>>