By the way, what was called BatchGradientDescent in the Scala examples
should be replaced by a port of the LinearRegression Example from
Java. I had them as two separate examples earlier.

What about RelationalQuery and TPC-H-Q3. Any thoughts about removing
RelationalQuery?

On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <aljos...@apache.org> wrote:
> I added the Triangle Enumeration Examples, thanks Fabian.
>
> So far we have ported: WordCount, KMeans, ConnectedComponents,
> WebLogAnalysis, TransitiveClosureNaive, TriangleEnumerationNaive/Opt
>
> These are the examples people called dibs on:
>  - PageRank (Fabian)
>  - BatchGradientDescent (Márton)
>  - ComputeEdgeDegrees (Hermann)
>
> Those are unclaimed (if I'm not mistaken):
>  - The relational Stuff
>  - LinearRegression
>
> On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <aljos...@apache.org> wrote:
>> Thanks, I added it. I'll keep a running list of ported/unported
>> examples in my mails. I'll rename the java example package to examples
>> once the Scala API merge is done.
>>
>> I think the termination criterion is fine as it is. Just because Scala
>> enables functional programming doesn't mean it's always the best
>> choice. :D
>>
>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>> WebLogAnalysis, TransitiveClosureNaive
>>
>> These are the examples people called dibs on:
>>  - TriangleEnumration and PageRank (Fabian)
>>  - BatchGradientDescent (Márton)
>>  - ComputeEdgeDegrees (Hermann)
>>
>> Those are unclaimed (if I'm not mistaken):
>>  - The relational Stuff
>>  - LinearRegression
>>
>> Cheers,
>> Aljoscha
>>
>> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <ktzou...@apache.org> wrote:
>>> Transitive closure here, I also added a termination criterion in the Java
>>> version: https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>>>
>>> Perhaps you can make the termination criterion in Scala more functional?
>>>
>>> I noticed that the examples package name is example.java but examples.scala
>>>
>>> Kostas
>>>
>>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <ktzou...@apache.org> wrote:
>>>>
>>>> I'll take TransitiveClosure and PiEstimation (was not on your list).
>>>>
>>>> If nobody volunteers for the relational stuff I can take those as well.
>>>>
>>>> How about removing the "RelationalQuery" from both Scala and Java? It
>>>> seems to be a proper subset of TPC-H Q3. Does it add some teaching value on
>>>> top of TPC-H Q3?
>>>>
>>>> Kostas
>>>>
>>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <aljos...@apache.org>
>>>> wrote:
>>>>>
>>>>> Thanks, I added it, along with an ITCase.
>>>>>
>>>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>>>> WebLogAnalysis
>>>>>
>>>>> These are the examples people called dibs on:
>>>>>  - TriangleEnumration and PageRank (Fabian)
>>>>>  - BatchGradientDescent (Márton)
>>>>>  - ComputeEdgeDegrees (Hermann)
>>>>>
>>>>> Those are unclaimed (if I'm not mistaken):
>>>>>  - TransitiveClosure
>>>>>  - The relational Stuff
>>>>>  - LinearRegression
>>>>>
>>>>> Cheers,
>>>>> Aljoscha
>>>>>
>>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <ktzou...@apache.org>
>>>>> wrote:
>>>>> > WebLog here:
>>>>> >
>>>>> > https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>>>>> >
>>>>> > Do you need any more done?
>>>>> >
>>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <aljos...@apache.org>
>>>>> > wrote:
>>>>> >
>>>>> >> I added the ConnectedComponents Example from Vasia.
>>>>> >>
>>>>> >> Keep 'em coming, people. :D
>>>>> >>
>>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <fhue...@apache.org>
>>>>> >> wrote:
>>>>> >> > Alright, will do.
>>>>> >> > Thanks!
>>>>> >> >
>>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <aljos...@apache.org>:
>>>>> >> >
>>>>> >> >> Ok people, executive decision. :D
>>>>> >> >>
>>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing the
>>>>> >> >> data
>>>>> >> >> in multi-dimensional object arrays and then converting it to the
>>>>> >> >> required Java or Scala objects.
>>>>> >> >>
>>>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent with the
>>>>> >> >> Java
>>>>> >> >> API.
>>>>> >> >>
>>>>> >> >> Regarding Join (and coGroup). There is no need for a keyword, you
>>>>> >> >> can
>>>>> >> >> just write:
>>>>> >> >>
>>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le,
>>>>> >> >> re)
>>>>> >> }
>>>>> >> >>
>>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <fhue...@apache.org>
>>>>> >> wrote:
>>>>> >> >> > Aside from the DataSet issue, I also found an inconsistency with
>>>>> >> >> > the
>>>>> >> Java
>>>>> >> >> > API. In Java join is done as:
>>>>> >> >> >
>>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>>>>> >> >> >
>>>>> >> >> > where in the current Scala this is:
>>>>> >> >> >
>>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>>>>> >> >> >
>>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>>>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method because
>>>>> >> "with"
>>>>> >> >> is
>>>>> >> >> > a keyword in Scala. Should be offer something similar for Scala
>>>>> >> >> > or go
>>>>> >> >> with
>>>>> >> >> > map() on Tuple2(left, right)?
>>>>> >> >> >
>>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <se...@apache.org>:
>>>>> >> >> >
>>>>> >> >> >> Instead of Strings, Object[][] would work as well. That is a
>>>>> >> >> >> generic
>>>>> >> >> >> representation of a Tuple.
>>>>> >> >> >>
>>>>> >> >> >> Alternatively, they could be stored as Java or Scala Tuples,
>>>>> >> >> >> with a
>>>>> >> >> generic
>>>>> >> >> >> utility method to convert between the two.
>>>>> >> >> >>
>>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>>>>> >> >> >> <fhue...@apache.org>
>>>>> >> >> wrote:
>>>>> >> >> >>
>>>>> >> >> >> > Yeah, I ran into the same problem...
>>>>> >> >> >> >
>>>>> >> >> >> > +1 for using Strings and parsing them,  but using the
>>>>> >> >> >> > CSVFormat
>>>>> >> won't
>>>>> >> >> >> work
>>>>> >> >> >> > because this is based on a FileInputFormat.
>>>>> >> >> >> > So we would need to parse the Strings manually...
>>>>> >> >> >> >
>>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>>>>> >> >> >> > <aljos...@apache.org>:
>>>>> >> >> >> >
>>>>> >> >> >> > > Hi,
>>>>> >> >> >> > > on second thought. Maybe we should just change all the
>>>>> >> >> >> > > example
>>>>> >> input
>>>>> >> >> >> > > data to strings and use CSV input formats in all the
>>>>> >> >> >> > > examples.
>>>>> >> What
>>>>> >> >> do
>>>>> >> >> >> > > you think?
>>>>> >> >> >> > >
>>>>> >> >> >> > > Cheers,
>>>>> >> >> >> > > Aljoscha
>>>>> >> >> >> > >
>>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
>>>>> >> >> aljos...@apache.org>
>>>>> >> >> >> > > wrote:
>>>>> >> >> >> > > > Hi,
>>>>> >> >> >> > > > yes it's unfortunate that the data types are incompatible.
>>>>> >> >> >> > > > I'm
>>>>> >> >> afraid
>>>>> >> >> >> > > > you have to to what you proposed: move the data to a
>>>>> >> >> >> > > > static
>>>>> >> field
>>>>> >> >> and
>>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method in Scala.
>>>>> >> >> >> > > > It's
>>>>> >> >> not
>>>>> >> >> >> > > > nice, but copying would duplicate the data and make it
>>>>> >> >> >> > > > easier
>>>>> >> for
>>>>> >> >> it
>>>>> >> >> >> > > > to go out of sync in the Java and Scala versions.
>>>>> >> >> >> > > >
>>>>> >> >> >> > > > What do the others think? This will probably occur in all
>>>>> >> >> >> > > > the
>>>>> >> >> >> examples.
>>>>> >> >> >> > > >
>>>>> >> >> >> > > > Cheers,
>>>>> >> >> >> > > > Aljoscha
>>>>> >> >> >> > > >
>>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
>>>>> >> >> >> > > > <vasilikikala...@gmail.com> wrote:
>>>>> >> >> >> > > >> Hey,
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >> I have ported the Connected Components example, but I am
>>>>> >> >> >> > > >> not
>>>>> >> sure
>>>>> >> >> >> how
>>>>> >> >> >> > to
>>>>> >> >> >> > > >> reuse the example input data from java-examples.
>>>>> >> >> >> > > >> In the ConnectedComponentsData class, the vertices and
>>>>> >> >> >> > > >> edges
>>>>> >> data
>>>>> >> >> >> are
>>>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
>>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
>>>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment as
>>>>> >> parameter.
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >> One way is to provide public static fields (like in the
>>>>> >> >> >> WordCountData
>>>>> >> >> >> > > >> class), but this introduces a conversion
>>>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala
>>>>> >> >> >> > > >> tuple and
>>>>> >> >> from
>>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess this is an
>>>>> >> unnecessary
>>>>> >> >> >> > > complexity
>>>>> >> >> >> > > >> for an example (?).
>>>>> >> >> >> > > >> Another way is, of course, to copy the example data in
>>>>> >> >> >> > > >> the
>>>>> >> Scala
>>>>> >> >> >> > > example.
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >> Am I missing something here?
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >> Thanks!
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >> Cheers,
>>>>> >> >> >> > > >> V.
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek <
>>>>> >> aljos...@apache.org
>>>>> >> >> >
>>>>> >> >> >> > > wrote:
>>>>> >> >> >> > > >>
>>>>> >> >> >> > > >>> Alright, I updated my repo:
>>>>> >> >> >> > > >>>
>>>>> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>>>> >> >> >> > > >>>
>>>>> >> >> >> > > >>> This now has a working WordCount example. It's pretty
>>>>> >> >> >> > > >>> much a
>>>>> >> >> copy
>>>>> >> >> >> of
>>>>> >> >> >> > > >>> the Java example with some fixups for the syntax and
>>>>> >> >> >> > > >>> lambda
>>>>> >> >> >> > functions.
>>>>> >> >> >> > > >>> You'll also notice that I added the java-examples as a
>>>>> >> >> dependency
>>>>> >> >> >> for
>>>>> >> >> >> > > >>> the scala-examples. I did this to reuse the example
>>>>> >> >> >> > > >>> input
>>>>> >> data.
>>>>> >> >> >> > > >>>
>>>>> >> >> >> > > >>> When you ported a program you can do a pull request
>>>>> >> >> >> > > >>> against
>>>>> >> my
>>>>> >> >> repo
>>>>> >> >> >> > > >>> and I will collect the examples.
>>>>> >> >> >> > > >>>
>>>>> >> >> >> > > >>> Happy coding. :D
>>>>> >> >> >> > > >>>
>>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
>>>>> >> >> >> reckone...@gmail.com
>>>>> >> >> >> > >
>>>>> >> >> >> > > >>> wrote:
>>>>> >> >> >> > > >>> > +1
>>>>> >> >> >> > > >>> >
>>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
>>>>> >> >> >> > > >>> >
>>>>> >> >> >> > > >>> >
>>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi <
>>>>> >> >> >> > > >>> balassi.mar...@gmail.com>
>>>>> >> >> >> > > >>> > wrote:
>>>>> >> >> >> > > >>> >
>>>>> >> >> >> > > >>> >> +1
>>>>> >> >> >> > > >>> >>
>>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
>>>>> >> >> >> > > >>> >>
>>>>> >> >> >> > > >>> >>
>>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <
>>>>> >> >> >> > > ktzou...@apache.org>
>>>>> >> >> >> > > >>> >> wrote:
>>>>> >> >> >> > > >>> >>
>>>>> >> >> >> > > >>> >> > +1
>>>>> >> >> >> > > >>> >> >
>>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>>>>> >> >> >> > > >>> >> >
>>>>> >> >> >> > > >>> >> > My experience with Scala consists of going through
>>>>> >> >> >> > > >>> >> > a
>>>>> >> >> tutorial
>>>>> >> >> >> so
>>>>> >> >> >> > > this
>>>>> >> >> >> > > >>> >> will
>>>>> >> >> >> > > >>> >> > be a good stress test both for me and the new API
>>>>> >> >> >> > > >>> >> > :-)
>>>>> >> >> >> > > >>> >> >
>>>>> >> >> >> > > >>> >> >
>>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri <
>>>>> >> >> >> > > >>> >> > vasilikikala...@gmail.com>
>>>>> >> >> >> > > >>> >> > wrote:
>>>>> >> >> >> > > >>> >> >
>>>>> >> >> >> > > >>> >> > > +1 for having other people implement the
>>>>> >> >> >> > > >>> >> > > examples!
>>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for me :)
>>>>> >> >> >> > > >>> >> > >
>>>>> >> >> >> > > >>> >> > > -V.
>>>>> >> >> >> > > >>> >> > >
>>>>> >> >> >> > > >>> >> > >
>>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske <
>>>>> >> >> >> fhue...@apache.org>
>>>>> >> >> >> > > >>> wrote:
>>>>> >> >> >> > > >>> >> > >
>>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and PageRank.
>>>>> >> >> >> > > >>> >> > > >
>>>>> >> >> >> > > >>> >> > > > Let's also do the examples similar to the Java
>>>>> >> >> examples:
>>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without parameters
>>>>> >> >> >> > > >>> >> > > > - parameters for external data
>>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
>>>>> >> >> >> > > >>> >> > > >
>>>>> >> >> >> > > >>> >> > > >
>>>>> >> >> >> > > >>> >> > > >
>>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha Krettek <
>>>>> >> >> >> > > aljos...@apache.org
>>>>> >> >> >> > > >>> >:
>>>>> >> >> >> > > >>> >> > > >
>>>>> >> >> >> > > >>> >> > > > > Will do, then people can reserve their
>>>>> >> >> >> > > >>> >> > > > > favourite
>>>>> >> >> >> examples
>>>>> >> >> >> > > here.
>>>>> >> >> >> > > >>> >> > > > >
>>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske
>>>>> >> >> >> > > >>> >> > > > > <
>>>>> >> >> >> > > >>> fhue...@apache.org>
>>>>> >> >> >> > > >>> >> > > > wrote:
>>>>> >> >> >> > > >>> >> > > > > > Hi,
>>>>> >> >> >> > > >>> >> > > > > >
>>>>> >> >> >> > > >>> >> > > > > > I think having examples implemented by
>>>>> >> >> >> > > >>> >> > > > > > different
>>>>> >> >> >> people
>>>>> >> >> >> > > >>> proved to
>>>>> >> >> >> > > >>> >> > be
>>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
>>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three examples.
>>>>> >> >> >> > > >>> >> > > > > >
>>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd port a simple
>>>>> >> >> >> > > >>> >> > > > > > first
>>>>> >> >> one
>>>>> >> >> >> > such
>>>>> >> >> >> > > as
>>>>> >> >> >> > > >>> >> > > WordCount.
>>>>> >> >> >> > > >>> >> > > > > >
>>>>> >> >> >> > > >>> >> > > > > > Fabian
>>>>> >> >> >> > > >>> >> > > > > >
>>>>> >> >> >> > > >>> >> > > > > >
>>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek
>>>>> >> >> >> > > >>> >> > > > > > <
>>>>> >> >> >> > > >>> aljos...@apache.org
>>>>> >> >> >> > > >>> >> >:
>>>>> >> >> >> > > >>> >> > > > > >
>>>>> >> >> >> > > >>> >> > > > > >> Hi,
>>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of the Scala API
>>>>> >> >> >> > > >>> >> > > > > >> here:
>>>>> >> >> >> > > >>> >> > > > > >>
>>>>> >> >> >> > > >>> >>
>>>>> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>>>> >> >> >> > > >>> >> > > > > >>
>>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only have to write
>>>>> >> >> >> > > >>> >> > > > > >> the
>>>>> >> tests
>>>>> >> >> and
>>>>> >> >> >> > > port
>>>>> >> >> >> > > >>> the
>>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it makes sense to
>>>>> >> >> >> > > >>> >> > > > > >> let
>>>>> >> other
>>>>> >> >> >> > people
>>>>> >> >> >> > > >>> port
>>>>> >> >> >> > > >>> >> the
>>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone else uses it and
>>>>> >> maybe
>>>>> >> >> >> > notices
>>>>> >> >> >> > > some
>>>>> >> >> >> > > >>> >> > quirks
>>>>> >> >> >> > > >>> >> > > > > >> in the API?
>>>>> >> >> >> > > >>> >> > > > > >>
>>>>> >> >> >> > > >>> >> > > > > >> Cheers,
>>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
>>>>> >> >> >> > > >>> >> > > > > >>
>>>>> >> >> >> > > >>> >> > > > >
>>>>> >> >> >> > > >>> >> > > >
>>>>> >> >> >> > > >>> >> > >
>>>>> >> >> >> > > >>> >> >
>>>>> >> >> >> > > >>> >>
>>>>> >> >> >> > > >>>
>>>>> >> >> >> > >
>>>>> >> >> >> >
>>>>> >> >> >>
>>>>> >> >>
>>>>> >>
>>>>
>>>>
>>>

Reply via email to