[GitHub] incubator-flink pull request: hadoopcompatibility: Implementations...

nirvanesque Wed, 27 Aug 2014 06:32:20 -0700

Github user nirvanesque commented on the pull request:

https://github.com/apache/incubator-flink/pull/37#issuecomment-53571986

Hello Artem and mentors,

First of all nice greetings from INRIA, France.
Hope you had an enjoyable experience in GSOC!
Thanks to Robert (rmetzger) for forwarding me here ...

At INRIA, we are starting to adopt Stratosphere / Flink.
The top-level goal is to enhance performance in User Defined Functions
(UDFs) with long workflows using multiple M-R, by using the larger set of
Second Order Functions (SOFs) in Stratosphere / Flink.
We will demonstrate this improvement by implementing some Use Cases for
business purposes.
For this purpose, we have chosen some customer analysis Use Cases using
weblogs and related data, for 2 companies (who appeared interested to try using
Stratosphere / Flink )
- a mobile phone app developer: http://www.tribeflame.com
- an anti-virus & Internet security software company: www.f-secure.com
I will be happy to share with you these Use Cases, if you are interested.
Just ask me here.

At present, we are typically in the profiles of Alice-Bob-Sam, as described
in [Artem's GSoC
proposal](https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis).
Hadoop seems to be the starting square for our Stratosphere / Flink journey.
Same is the situation with developers in the above 2 companies :-)

Briefly,
We have installed and run some example programmes from Flink / Stratosphere
(versions 0.5.2 and 0.6). We use a cluster (the grid5000 for our Hadoop &
Stratosphere installations)
We have some good understanding of Hadoop and its use in Streaming and
Pipes in conjunction with scripting languages (Python & R specifically)
In the first phase, we would like to run some "Hadoop-like" jobs (mainly
multiple M-R workflows) on Stratosphere, preferably with extensive Java or
Scala programming.
I refer to your [GSoC project
map](https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-%28Project-Map-and-Notes%29)
which seems very interesting.
If we could have a Hadoop abstraction as you have mentioned, that would be
ideal for our first phase.
In later phases, when we implement complex join and group operations, we
would dive deeper into Stratosphere / Flink Java or Scala APIs

Hence, I would like to know, what is the current status in this direction?
What has been implemented already? In which version onwards? How to try
them?
What is yet to be implemented? When - which versions?

You may also like to see [my discussion with Robert on this
page](http://flink.incubator.apache.org/docs/0.6-incubating/cli.html#comment-1558297261).
I am still mining into different discussions - here as well as on JIRA.
Please do refer me to the relevant links, JIRA tickets, etc if that saves
your time in re-typing large replies.
It will help us to catch up fast with the train of collective thinking in
the Stratosphere / Flink roadmap, and eventually contribute to the project.

Thanks in advance,
Anirvan
PS : Apologies for using names / rechristened names (e.g. Flink /
Stratosphere) as I am not sure, which name to use currently.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-flink pull request: hadoopcompatibility: Implementations...

Reply via email to