Aside from the DataSet issue, I also found an inconsistency with the Java API. In Java join is done as:
ds1.join(ds2).where(...).equalTo(...) where in the current Scala this is: ds1.join(d2).where(...).isEqualTo(...) isEqualTo() should be renamed to equalTo(), IMO. Also, join (+cross and coGroup?) lacks the with() method because "with" is a keyword in Scala. Should be offer something similar for Scala or go with map() on Tuple2(left, right)? 2014-09-08 13:51 GMT+02:00 Stephan Ewen <[email protected]>: > Instead of Strings, Object[][] would work as well. That is a generic > representation of a Tuple. > > Alternatively, they could be stored as Java or Scala Tuples, with a generic > utility method to convert between the two. > > On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <[email protected]> wrote: > > > Yeah, I ran into the same problem... > > > > +1 for using Strings and parsing them, but using the CSVFormat won't > work > > because this is based on a FileInputFormat. > > So we would need to parse the Strings manually... > > > > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <[email protected]>: > > > > > Hi, > > > on second thought. Maybe we should just change all the example input > > > data to strings and use CSV input formats in all the examples. What do > > > you think? > > > > > > Cheers, > > > Aljoscha > > > > > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <[email protected]> > > > wrote: > > > > Hi, > > > > yes it's unfortunate that the data types are incompatible. I'm afraid > > > > you have to to what you proposed: move the data to a static field and > > > > convert it in the getDefaultEdgeDataSet() method in Scala. It's not > > > > nice, but copying would duplicate the data and make it easier for it > > > > to go out of sync in the Java and Scala versions. > > > > > > > > What do the others think? This will probably occur in all the > examples. > > > > > > > > Cheers, > > > > Aljoscha > > > > > > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri > > > > <[email protected]> wrote: > > > >> Hey, > > > >> > > > >> I have ported the Connected Components example, but I am not sure > how > > to > > > >> reuse the example input data from java-examples. > > > >> In the ConnectedComponentsData class, the vertices and edges data > are > > > >> produced by the methods getDefaultVertexDataSet() > > > >> and getDefaultEdgeDataSet(), which take > > > >> an org.apache.flink.api.java.ExecutionEnvironment as parameter. > > > >> > > > >> One way is to provide public static fields (like in the > WordCountData > > > >> class), but this introduces a conversion > > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and from > > > >> java.lang.Long to scala.Long and I guess this is an unnecessary > > > complexity > > > >> for an example (?). > > > >> Another way is, of course, to copy the example data in the Scala > > > example. > > > >> > > > >> Am I missing something here? > > > >> > > > >> Thanks! > > > >> > > > >> Cheers, > > > >> V. > > > >> > > > >> > > > >> On 5 September 2014 15:52, Aljoscha Krettek <[email protected]> > > > wrote: > > > >> > > > >>> Alright, I updated my repo: > > > >>> https://github.com/aljoscha/incubator-flink/commits/scala-rework > > > >>> > > > >>> This now has a working WordCount example. It's pretty much a copy > of > > > >>> the Java example with some fixups for the syntax and lambda > > functions. > > > >>> You'll also notice that I added the java-examples as a dependency > for > > > >>> the scala-examples. I did this to reuse the example input data. > > > >>> > > > >>> When you ported a program you can do a pull request against my repo > > > >>> and I will collect the examples. > > > >>> > > > >>> Happy coding. :D > > > >>> > > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor < > [email protected] > > > > > > >>> wrote: > > > >>> > +1 > > > >>> > > > > >>> > ComputeEdgeDegrees for me! > > > >>> > > > > >>> > > > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi < > > > >>> [email protected]> > > > >>> > wrote: > > > >>> > > > > >>> >> +1 > > > >>> >> > > > >>> >> BatchGradientDescent for me :) > > > >>> >> > > > >>> >> > > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas < > > > [email protected]> > > > >>> >> wrote: > > > >>> >> > > > >>> >> > +1 > > > >>> >> > > > > >>> >> > I go for WebLogAnalysis. > > > >>> >> > > > > >>> >> > My experience with Scala consists of going through a tutorial > so > > > this > > > >>> >> will > > > >>> >> > be a good stress test both for me and the new API :-) > > > >>> >> > > > > >>> >> > > > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri < > > > >>> >> > [email protected]> > > > >>> >> > wrote: > > > >>> >> > > > > >>> >> > > +1 for having other people implement the examples! > > > >>> >> > > Connected Components and Kmeans for me :) > > > >>> >> > > > > > >>> >> > > -V. > > > >>> >> > > > > > >>> >> > > > > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske < > [email protected]> > > > >>> wrote: > > > >>> >> > > > > > >>> >> > > > I go for TriangleEnumeration and PageRank. > > > >>> >> > > > > > > >>> >> > > > Let's also do the examples similar to the Java examples: > > > >>> >> > > > - running out-of-the-box without parameters > > > >>> >> > > > - parameters for external data > > > >>> >> > > > - follow a similar code structure > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > > > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek < > > > [email protected] > > > >>> >: > > > >>> >> > > > > > > >>> >> > > > > Will do, then people can reserve their favourite > examples > > > here. > > > >>> >> > > > > > > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske < > > > >>> [email protected]> > > > >>> >> > > > wrote: > > > >>> >> > > > > > Hi, > > > >>> >> > > > > > > > > >>> >> > > > > > I think having examples implemented by different > people > > > >>> proved to > > > >>> >> > be > > > >>> >> > > > > > valuable in the past. > > > >>> >> > > > > > I'd help with two or three examples. > > > >>> >> > > > > > > > > >>> >> > > > > > It might be helpful if you'd port a simple first one > > such > > > as > > > >>> >> > > WordCount. > > > >>> >> > > > > > > > > >>> >> > > > > > Fabian > > > >>> >> > > > > > > > > >>> >> > > > > > > > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek < > > > >>> [email protected] > > > >>> >> >: > > > >>> >> > > > > > > > > >>> >> > > > > >> Hi, > > > >>> >> > > > > >> I have a working rewrite of the Scala API here: > > > >>> >> > > > > >> > > > >>> >> > https://github.com/aljoscha/incubator-flink/commits/scala-rework > > > >>> >> > > > > >> > > > >>> >> > > > > >> I'm hoping that I'll only have to write the tests and > > > port > > > >>> the > > > >>> >> > > > > >> examples. Do you think it makes sense to let other > > people > > > >>> port > > > >>> >> the > > > >>> >> > > > > >> examples, so that someone else uses it and maybe > > notices > > > some > > > >>> >> > quirks > > > >>> >> > > > > >> in the API? > > > >>> >> > > > > >> > > > >>> >> > > > > >> Cheers, > > > >>> >> > > > > >> Aljoscha > > > >>> >> > > > > >> > > > >>> >> > > > > > > > >>> >> > > > > > > >>> >> > > > > > >>> >> > > > > >>> >> > > > >>> > > > > > >
