Re: Cool project: H2O on Flink

2015-01-24 Thread Henry Saputra
This is great news, Kostas and Sri.
Looks like the visit was super useful =)

I would love to help jump working on this!

- Henry

On Sat, Jan 24, 2015 at 1:07 PM, Sri Ambati  wrote:
> Kostas,
> Thank you for your generosity.
>
> We are honored to be a part of Apache Flink's community & it's amazing 
> journey from Berlin and beyond!
>
> H2O is excited to bring best-in-class machine learning to application 
> developers worldwide.
>
> Looking forward,
> Sri
>
>> On Jan 24, 2015, at 11:39 AM, Kostas Tzoumas  wrote:
>>
>> Hi everyone,
>>
>> I had a chat with some folks behind the H2O project (http://h2o.ai), and 
>> they would be interested in having H2O run on top/inside of Flink. H2O is a 
>> very performant system focused on Machine Learning.
>>
>> A similar integration has been implemented for H2O on Spark (called 
>> sparkling water - https://github.com/h2oai/sparkling-water).
>>
>> This would be a very cool project for someone that is interested in getting 
>> started with Flink, and it should not be too hard to get started by 
>> following sparkling water as a blueprint.
>>
>> So if someone wants to jump on that, it would be great! Michal, the creator 
>> of sparkling water is also willing to guide/give advice.
>>


Re: YARN ITCases fail, master broken?

2015-01-24 Thread Vasiliki Kalavri
Hi,

"mvn clean verify" fails for me on Ubuntu with deleted .m2 repository.
I'm getting the following:

Results :

Failed tests:
  YARNSessionFIFOITCase.setup:56->YarnTestBase.startYARNWithConfig:249 null

YARNSessionCapacitySchedulerITCase.setup:42->YarnTestBase.startYARNWithConfig:249
null

Tests run: 2, Failures: 2, Errors: 0, Skipped: 0

-V.

On 24 January 2015 at 23:03, Fabian Hueske  wrote:

> The build fails also after the .m2 repository was deleted.
>
> Does anybody else have this problem?
>
> 2015-01-24 21:31 GMT+01:00 Stephan Ewen :
>
> > Is this reproducible on a machine when you delete the .m2/repository
> > directory (local maven cache) ?
> >
> > (I currently cannot try that because I am behind a rather low-bandwith
> > connection and would take very long to re-download all dependency
> > artifacts)
> >
> > On Sat, Jan 24, 2015 at 5:54 AM, Fabian Hueske 
> wrote:
> >
> > > I just tried to build ("mvn clean install") on a fresh Ubuntu VM. Fails
> > > with the same exception as natively on MacOS.
> > > Something strange is going on...
> > >
> > > 2015-01-24 11:19 GMT+01:00 Fabian Hueske :
> > >
> > > > Thanks Robert! Sounds indeed like an environment problem.
> > > > Will run the tests again and send you the output.
> > > >
> > > > 2015-01-24 11:11 GMT+01:00 Robert Metzger :
> > > >
> > > >> Okay, the tests have finished on my local machine, and they passed.
> So
> > > it
> > > >> looks like an environment specific issue.
> > > >> Maybe the log helps me already to figure out whats the issue.
> > > >> We should make sure that our tests are passing on all platforms ;)
> > > >>
> > > >> On Sat, Jan 24, 2015 at 11:06 AM, Robert Metzger <
> rmetz...@apache.org
> > >
> > > >> wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > the tests are passing on travis. Maybe its a issue with your
> > > >> environment.
> > > >> > I'm currently running the tests on my machine as well, just to
> make
> > > >> sure.
> > > >> > I haven't ran the tests on OS X, maybe that's causing the issues.
> > > >> >
> > > >> > Can you send me (privately) the full output of the tests?
> > > >> >
> > > >> > Best,
> > > >> > Robert
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Sat, Jan 24, 2015 at 11:00 AM, Fabian Hueske <
> fhue...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> >> Hi Henry,
> > > >> >>
> > > >> >> running "mvn -DskipTests clean install" before "mvn clean
> install"
> > > did
> > > >> not
> > > >> >> fix the build for me.
> > > >> >> The failing tests are also integration tests (*ITCase) which are
> > only
> > > >> >> executed in Maven's verify phase which is not triggered if you
> run
> > > "mvn
> > > >> >> clean test".
> > > >> >> If I run "mvn test" without "mvn install" it fails for me as well
> > > with
> > > >> the
> > > >> >> error you posted.
> > > >> >>
> > > >> >> So there seem to be at least two build issues with the current
> > > master.
> > > >> >>
> > > >> >> 2015-01-24 1:47 GMT+01:00 Henry Saputra  >:
> > > >> >>
> > > >> >> > Hmm, I think there could be some weird dependencies to get the
> > > Flink
> > > >> >> > YARN uber jar.
> > > >> >> >
> > > >> >> > If you do "mvn clean install -DskipTests" then call "mvn test"
> > all
> > > >> the
> > > >> >> > tests passed.
> > > >> >> >
> > > >> >> > But if you directly call "mvn clean test" then you see the
> stack
> > I
> > > >> >> > have seen before.
> > > >> >> >
> > > >> >> > - Henry
> > > >> >> >
> > > >> >> >
> > > >> >> > On Fri, Jan 23, 2015 at 3:35 PM, Henry Saputra <
> > > >> henry.sapu...@gmail.com
> > > >> >> >
> > > >> >> > wrote:
> > > >> >> > > Did not see that trace but do see this:
> > > >> >> > >
> > > >> >> > > ---
> > > >> >> > >
> > > >> >> > >  T E S T S
> > > >> >> > >
> > > >> >> > > ---
> > > >> >> > >
> > > >> >> > > Running org.apache.flink.yarn.UtilsTest
> > > >> >> > >
> > > >> >> > > log4j:WARN No such property [append] in
> > > >> >> org.apache.log4j.ConsoleAppender.
> > > >> >> > >
> > > >> >> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time
> elapsed:
> > > >> 0.476
> > > >> >> > > sec <<< FAILURE! - in org.apache.flink.yarn.UtilsTest
> > > >> >> > >
> > > >> >> > > testUberjarLocator(org.apache.flink.yarn.UtilsTest)  Time
> > > elapsed:
> > > >> >> > > 0.405 sec  <<< FAILURE!
> > > >> >> > >
> > > >> >> > > java.lang.AssertionError: null
> > > >> >> > >
> > > >> >> > > at org.junit.Assert.fail(Assert.java:86)
> > > >> >> > >
> > > >> >> > > at org.junit.Assert.assertTrue(Assert.java:41)
> > > >> >> > >
> > > >> >> > > at org.junit.Assert.assertNotNull(Assert.java:621)
> > > >> >> > >
> > > >> >> > > at org.junit.Assert.assertNotNull(Assert.java:631)
> > > >> >> > >
> > > >> >> > > at
> > > >> >>
> > org.apache.flink.yarn.UtilsTest.testUberjarLocator(UtilsTest.java:32)
> > > >> >> > >
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > Results :
> > > >> >> > >
> > > >> >> > >
> > > >> >> > > Failed tests:
> > > >

[jira] [Created] (FLINK-1445) Add support to enforce local input split assignment

2015-01-24 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-1445:


 Summary: Add support to enforce local input split assignment
 Key: FLINK-1445
 URL: https://issues.apache.org/jira/browse/FLINK-1445
 Project: Flink
  Issue Type: New Feature
  Components: Java API, JobManager
Affects Versions: 0.9
Reporter: Fabian Hueske
Priority: Minor


In some scenarios data sources cannot remotely read data as for example in 
distributed cluster setups where each machine stores data on its local FS which 
cannot be remotely read.

In order to enable such use cases with Flink, we need to 
1) add support for enforcing local input split reading.
2) ensure that each input split can be locally read by at least one data source 
task which means to influence the scheduling of data source tasks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1444) Add data properties for data sources

2015-01-24 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-1444:


 Summary: Add data properties for data sources
 Key: FLINK-1444
 URL: https://issues.apache.org/jira/browse/FLINK-1444
 Project: Flink
  Issue Type: New Feature
  Components: Java API, JobManager, Optimizer
Affects Versions: 0.9
Reporter: Fabian Hueske
Priority: Minor


This issue proposes to add support for attaching data properties to data 
sources. These data properties are defined with respect to input splits.
Possible properties are:

- partitioning across splits: all elements of the same key (combination) are 
contained in one split
- sorting / grouping with splits: elements are sorted or grouped on certain 
keys within a split
- key uniqueness: a certain key (combination) is unique for all elements of the 
data source. This property is not defined wrt. input splits.

The optimizer can leverage this information to generate more efficient 
execution plans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1443) Add replicated data source

2015-01-24 Thread Fabian Hueske (JIRA)
Fabian Hueske created FLINK-1443:


 Summary: Add replicated data source
 Key: FLINK-1443
 URL: https://issues.apache.org/jira/browse/FLINK-1443
 Project: Flink
  Issue Type: New Feature
  Components: Java API, JobManager, Optimizer
Affects Versions: 0.9
Reporter: Fabian Hueske
Priority: Minor


This issue proposes to add support for data sources that read the same data in 
all parallel instances. This feature can be useful, if the data is replicated 
to all machines in a cluster and can be locally read. 
For example, a replicated input format can be used for a broadcast join without 
sending any data over the network.

The following changes are necessary to achieve this:
1) Add a replicating InputSplitAssigner which assigns all splits to the all 
parallel instances. This requires also to extend the InputSplitAssigner 
interface to identify the exact parallel instance that requests an InputSplit 
(currently only the hostname is provided).
2) Make sure that the DOP of the replicated data source is identical to the DOP 
of its successor.
3) Let the optimizer know that the data is replicated and ensure that plan 
enumeration works correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: YARN ITCases fail, master broken?

2015-01-24 Thread Fabian Hueske
The build fails also after the .m2 repository was deleted.

Does anybody else have this problem?

2015-01-24 21:31 GMT+01:00 Stephan Ewen :

> Is this reproducible on a machine when you delete the .m2/repository
> directory (local maven cache) ?
>
> (I currently cannot try that because I am behind a rather low-bandwith
> connection and would take very long to re-download all dependency
> artifacts)
>
> On Sat, Jan 24, 2015 at 5:54 AM, Fabian Hueske  wrote:
>
> > I just tried to build ("mvn clean install") on a fresh Ubuntu VM. Fails
> > with the same exception as natively on MacOS.
> > Something strange is going on...
> >
> > 2015-01-24 11:19 GMT+01:00 Fabian Hueske :
> >
> > > Thanks Robert! Sounds indeed like an environment problem.
> > > Will run the tests again and send you the output.
> > >
> > > 2015-01-24 11:11 GMT+01:00 Robert Metzger :
> > >
> > >> Okay, the tests have finished on my local machine, and they passed. So
> > it
> > >> looks like an environment specific issue.
> > >> Maybe the log helps me already to figure out whats the issue.
> > >> We should make sure that our tests are passing on all platforms ;)
> > >>
> > >> On Sat, Jan 24, 2015 at 11:06 AM, Robert Metzger  >
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > the tests are passing on travis. Maybe its a issue with your
> > >> environment.
> > >> > I'm currently running the tests on my machine as well, just to make
> > >> sure.
> > >> > I haven't ran the tests on OS X, maybe that's causing the issues.
> > >> >
> > >> > Can you send me (privately) the full output of the tests?
> > >> >
> > >> > Best,
> > >> > Robert
> > >> >
> > >> >
> > >> >
> > >> > On Sat, Jan 24, 2015 at 11:00 AM, Fabian Hueske 
> > >> wrote:
> > >> >
> > >> >> Hi Henry,
> > >> >>
> > >> >> running "mvn -DskipTests clean install" before "mvn clean install"
> > did
> > >> not
> > >> >> fix the build for me.
> > >> >> The failing tests are also integration tests (*ITCase) which are
> only
> > >> >> executed in Maven's verify phase which is not triggered if you run
> > "mvn
> > >> >> clean test".
> > >> >> If I run "mvn test" without "mvn install" it fails for me as well
> > with
> > >> the
> > >> >> error you posted.
> > >> >>
> > >> >> So there seem to be at least two build issues with the current
> > master.
> > >> >>
> > >> >> 2015-01-24 1:47 GMT+01:00 Henry Saputra :
> > >> >>
> > >> >> > Hmm, I think there could be some weird dependencies to get the
> > Flink
> > >> >> > YARN uber jar.
> > >> >> >
> > >> >> > If you do "mvn clean install -DskipTests" then call "mvn test"
> all
> > >> the
> > >> >> > tests passed.
> > >> >> >
> > >> >> > But if you directly call "mvn clean test" then you see the stack
> I
> > >> >> > have seen before.
> > >> >> >
> > >> >> > - Henry
> > >> >> >
> > >> >> >
> > >> >> > On Fri, Jan 23, 2015 at 3:35 PM, Henry Saputra <
> > >> henry.sapu...@gmail.com
> > >> >> >
> > >> >> > wrote:
> > >> >> > > Did not see that trace but do see this:
> > >> >> > >
> > >> >> > > ---
> > >> >> > >
> > >> >> > >  T E S T S
> > >> >> > >
> > >> >> > > ---
> > >> >> > >
> > >> >> > > Running org.apache.flink.yarn.UtilsTest
> > >> >> > >
> > >> >> > > log4j:WARN No such property [append] in
> > >> >> org.apache.log4j.ConsoleAppender.
> > >> >> > >
> > >> >> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
> > >> 0.476
> > >> >> > > sec <<< FAILURE! - in org.apache.flink.yarn.UtilsTest
> > >> >> > >
> > >> >> > > testUberjarLocator(org.apache.flink.yarn.UtilsTest)  Time
> > elapsed:
> > >> >> > > 0.405 sec  <<< FAILURE!
> > >> >> > >
> > >> >> > > java.lang.AssertionError: null
> > >> >> > >
> > >> >> > > at org.junit.Assert.fail(Assert.java:86)
> > >> >> > >
> > >> >> > > at org.junit.Assert.assertTrue(Assert.java:41)
> > >> >> > >
> > >> >> > > at org.junit.Assert.assertNotNull(Assert.java:621)
> > >> >> > >
> > >> >> > > at org.junit.Assert.assertNotNull(Assert.java:631)
> > >> >> > >
> > >> >> > > at
> > >> >>
> org.apache.flink.yarn.UtilsTest.testUberjarLocator(UtilsTest.java:32)
> > >> >> > >
> > >> >> > >
> > >> >> > >
> > >> >> > > Results :
> > >> >> > >
> > >> >> > >
> > >> >> > > Failed tests:
> > >> >> > >
> > >> >> > >   UtilsTest.testUberjarLocator:32 null
> > >> >> > >
> > >> >> > >
> > >> >> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
> > >> >> > >
> > >> >> > >
> > >> >> > > - Henry
> > >> >> > >
> > >> >> > > On Fri, Jan 23, 2015 at 2:16 PM, Fabian Hueske <
> > fhue...@gmail.com>
> > >> >> > wrote:
> > >> >> > >> Hi all,
> > >> >> > >>
> > >> >> > >> I tried to build the current master (mvn clean install) and
> some
> > >> >> tests
> > >> >> > in
> > >> >> > >> the flink-yarn-tests module fail:
> > >> >> > >>
> > >> >> > >> Failed tests:
> > >> >> > >>
> > >> >> > >>
> > >> >> > >>
> > >> >> >
> > >> >>
> > >>
> >
> YARNSessionCapacitySchedulerITCase.testClientStartup:50->YarnTestBase.runWithArgs:314
> >

Re: Adding non-core API features to Flink

2015-01-24 Thread Ted Dunning
As the community of flink add-ons grows, a CPAN or maven-like mechanism
might be a nice option.  That would let people download and install
extensions very fluidly.

The argument for making Apache contributions is definitely valid, but the
argument for the agility of fostering independent projects is that projects
can gain lots of popularity very quickly this way.  CPAN, CRAN, pip, maven
and RubyGems can be argued to be critical components of the popularity of
Perl, R, Python, Java/Scala and Ruby respectively.




On Sat, Jan 24, 2015 at 4:26 PM, Fabian Hueske  wrote:

> I am also more in favor of option 1).
>
> 2015-01-24 20:27 GMT+01:00 Kostas Tzoumas :
>
> > Thanks Fabian for starting the discussion.
> >
> > I would be biased towards option (1) that Stephan highlighted for the
> > following reasons:
> >
> > - A separate github project is one more infrastructure to manage, and it
> > lives outside the ASF. I would like to bring as much code as possible to
> > the Apache Software Foundation, and not divide the codebase into two
> > separate repositories.
> >
> > - The personal gratification (and thus motivation) is higher when
> > contributing to a top-level Apache project than a github repository
> > slightly associated with an ASF project. And contributors to the Flink
> > project get karma that may lead to new committers, which is crucial as
> the
> > project is growing.
> >
> > Of course, non Apache-licensed contributions cannot be accepted. If we
> have
> > a good amount of those, we can start an infrastructure for Flink packages
> > that lives outside the ASF, but I would wait for the need to come before
> > doing this.
> >
> > My proposal would be to funnel contributions to the main repository (in a
> > flink-contrib module) for now, including the recent contributions.
> >
> > Kostas
> >
> >
> > On Sat, Jan 24, 2015 at 11:15 AM, Stephan Ewen  wrote:
> >
> > > Yes, a "flink-contrib" project would be great.
> > >
> > > We have two options:
> > >
> > > 1) Make it part of the flink apache project.
> > >   - PRO this makes it easy to get stuff for users
> > >   - CONTRA this means stronger requirements on the code, blocker for
> code
> > > that uses dependencies under certain licenses, etc.
> > >
> > > 2) Make an independent github project.
> > >  - PRO contributions can depended on more licenses, such as LGPL
> > >  - PRO we can have more people that commit to this repo, committers can
> > be
> > > different from flink committers
> > >  - CONTRA people need to grab the extensions from a different location
> > >
> > >
> > > I am slightly biased towards (2), but open to both.
> > >
> > > Stephan
> > >
> > >
> > >
> > >
> > > On Sat, Jan 24, 2015 at 5:29 AM, Chiwan Park 
> > > wrote:
> > >
> > > > I think top level maven module called "flink-contrib" is reasonable.
> > > There
> > > > are other projects having contrib package such as Akka, Django.
> > > >
> > > > Regards, Chiwan Park (Sent with iPhone)
> > > >
> > > > 2015. 1. 24. 오후 7:15 Fabian Hueske  작성:
> > > >
> > > > > Hi all,
> > > > >
> > > > > we got a few contribution requests lately to add cool but
> "non-core"
> > > > > features to our API.
> > > > > In previous discussions, concerns were raised to not bloat the APIs
> > > with
> > > > > too many "shortcut", "syntactic sugar", or special-case features.
> > > > >
> > > > > Instead we could setup a place to add Input/OutputFormats, common
> > > > > operations, etc. which does not need as much control as the core
> > APIs.
> > > > Open
> > > > > questions are:
> > > > > - How do we organize it? (top-level maven module, modules in
> > > flink-java,
> > > > > flink-scala, java packages in the API modules, ...)
> > > > > - How do we name it? flink-utils, flink-packages, ...
> > > > >
> > > > > Any opinions on this?
> > > > >
> > > > > Cheers, Fabian
> > > >
> > >
> >
>


Webclient and visualizer support for streaming programs

2015-01-24 Thread Gyula Fóra
Hey guys,

I opened a pull request which adds support to the webclient for executing
and visualizing streaming programs.

I had to make modifications to the clients and the way plans are handled,
so someone should definitely review it :)

https://github.com/apache/flink/pull/334

Cheers,
Gyula


Re: Naming of semantic annotations

2015-01-24 Thread Stephan Ewen
I agree with ForwardFields as well.

I vaguely remember that Joe Harjung (when working on the first Scala API
version) called it the CopySet. I would assume that ForwardFields is more
intuitive to most people.

 I only mention this, because Joe was one of the few English native
speakers in the team. Would be nice to have a comment by another English
native speaker ;-)



On Fri, Jan 23, 2015 at 1:51 PM, Chesnay Schepler <
chesnay.schep...@fu-berlin.de> wrote:

> +1 ForwardedFields
>
>
> On 23.01.2015 22:38, Vasiliki Kalavri wrote:
>
>> Hi,
>>
>> +1 for ForwardedFields. I like it much more than ConstantFields.
>> I think it makes it clear what the feature does.
>>
>> It's a very cool feature and indeed not advertised a lot. I use it when I
>> remember, but most of the times I forget it exists ;)
>>
>> -V.
>>
>> On 23 January 2015 at 22:12, Fabian Hueske  wrote:
>>
>>  Hi all,
>>>
>>> I have a pending pull request (#311) to fix and enable semantic
>>> information
>>> for functions with nested and Pojo types.
>>> Semantic information is used to tell the optimizer about the behavior of
>>> user-defined functions.
>>> The optimizer can use this information to generate more efficient
>>> execution
>>> plans.
>>>
>>> Assume for example a data set which is partitioned on the first field of
>>> a
>>> tuple and which is given to a Map function. If the optimizer knows, that
>>> the Map function does not modify the first field, it can infer that the
>>> data is still partitioned after the Map function was applied.
>>>
>>> There are two ways to give semantic information for user-defined
>>> function:
>>> 1) Class annotations:
>>> @ConstantFields("0; 1->2")
>>> public class MyMapper extends MapFunction<...> { }
>>>
>>> 2) Inline data flow:
>>> data.map(new MapFunction<...>() {...}).witConstantSet("0; 1->2");
>>>
>>> In both cases the semantic annotation indicates that the first field (0)
>>> is
>>> preserved and the second field of the input (1) is forwarded to the third
>>> field of the output (2).
>>>
>>> The question is how should we name this feature?
>>> Right now it is inconsistently called "ConstantField" and "ConstantSet".
>>>
>>> I would prefer the name ForwardedFields because this indicates that
>>> fields
>>> are "forwarded" through the function and possibly also moved to another
>>> location. It would however, change the API (although I don't think this
>>> feature is often used because it was not advertised a lot).
>>>
>>> Any other suggestions or opinions on this?
>>>
>>> Cheers, Fabian
>>>
>>>
>


Re: Adding non-core API features to Flink

2015-01-24 Thread Fabian Hueske
I am also more in favor of option 1).

2015-01-24 20:27 GMT+01:00 Kostas Tzoumas :

> Thanks Fabian for starting the discussion.
>
> I would be biased towards option (1) that Stephan highlighted for the
> following reasons:
>
> - A separate github project is one more infrastructure to manage, and it
> lives outside the ASF. I would like to bring as much code as possible to
> the Apache Software Foundation, and not divide the codebase into two
> separate repositories.
>
> - The personal gratification (and thus motivation) is higher when
> contributing to a top-level Apache project than a github repository
> slightly associated with an ASF project. And contributors to the Flink
> project get karma that may lead to new committers, which is crucial as the
> project is growing.
>
> Of course, non Apache-licensed contributions cannot be accepted. If we have
> a good amount of those, we can start an infrastructure for Flink packages
> that lives outside the ASF, but I would wait for the need to come before
> doing this.
>
> My proposal would be to funnel contributions to the main repository (in a
> flink-contrib module) for now, including the recent contributions.
>
> Kostas
>
>
> On Sat, Jan 24, 2015 at 11:15 AM, Stephan Ewen  wrote:
>
> > Yes, a "flink-contrib" project would be great.
> >
> > We have two options:
> >
> > 1) Make it part of the flink apache project.
> >   - PRO this makes it easy to get stuff for users
> >   - CONTRA this means stronger requirements on the code, blocker for code
> > that uses dependencies under certain licenses, etc.
> >
> > 2) Make an independent github project.
> >  - PRO contributions can depended on more licenses, such as LGPL
> >  - PRO we can have more people that commit to this repo, committers can
> be
> > different from flink committers
> >  - CONTRA people need to grab the extensions from a different location
> >
> >
> > I am slightly biased towards (2), but open to both.
> >
> > Stephan
> >
> >
> >
> >
> > On Sat, Jan 24, 2015 at 5:29 AM, Chiwan Park 
> > wrote:
> >
> > > I think top level maven module called "flink-contrib" is reasonable.
> > There
> > > are other projects having contrib package such as Akka, Django.
> > >
> > > Regards, Chiwan Park (Sent with iPhone)
> > >
> > > 2015. 1. 24. 오후 7:15 Fabian Hueske  작성:
> > >
> > > > Hi all,
> > > >
> > > > we got a few contribution requests lately to add cool but "non-core"
> > > > features to our API.
> > > > In previous discussions, concerns were raised to not bloat the APIs
> > with
> > > > too many "shortcut", "syntactic sugar", or special-case features.
> > > >
> > > > Instead we could setup a place to add Input/OutputFormats, common
> > > > operations, etc. which does not need as much control as the core
> APIs.
> > > Open
> > > > questions are:
> > > > - How do we organize it? (top-level maven module, modules in
> > flink-java,
> > > > flink-scala, java packages in the API modules, ...)
> > > > - How do we name it? flink-utils, flink-packages, ...
> > > >
> > > > Any opinions on this?
> > > >
> > > > Cheers, Fabian
> > >
> >
>


Re: Cool project: H2O on Flink

2015-01-24 Thread Sri Ambati
Kostas, 
Thank you for your generosity.

We are honored to be a part of Apache Flink's community & it's amazing journey 
from Berlin and beyond!

H2O is excited to bring best-in-class machine learning to application 
developers worldwide. 

Looking forward,
Sri

> On Jan 24, 2015, at 11:39 AM, Kostas Tzoumas  wrote:
> 
> Hi everyone,
> 
> I had a chat with some folks behind the H2O project (http://h2o.ai), and they 
> would be interested in having H2O run on top/inside of Flink. H2O is a very 
> performant system focused on Machine Learning.
> 
> A similar integration has been implemented for H2O on Spark (called sparkling 
> water - https://github.com/h2oai/sparkling-water). 
> 
> This would be a very cool project for someone that is interested in getting 
> started with Flink, and it should not be too hard to get started by following 
> sparkling water as a blueprint. 
> 
> So if someone wants to jump on that, it would be great! Michal, the creator 
> of sparkling water is also willing to guide/give advice.
> 


[jira] [Created] (FLINK-1442) Archived Execution Graph consumes too much memory

2015-01-24 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1442:
---

 Summary: Archived Execution Graph consumes too much memory
 Key: FLINK-1442
 URL: https://issues.apache.org/jira/browse/FLINK-1442
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen


The JobManager archives the execution graphs, for analysis of jobs. The graphs 
may consume a lot of memory.

Especially the execution edges in all2all connection patterns are extremely 
many and add up in memory consumption.

The execution edges connect all parallel tasks. So for a all2all pattern 
between n and m tasks, there are n*m edges. For parallelism of multiple 100 
tasks, this can easily reach 100k objects and more, each with a set of metadata.

I propose the following to solve that:

1.  Clear all execution edges from the graph (majority of the memory consumers) 
when it is given to the archiver.

2. Have the map/list of the archived graphs behind a soft reference, to it will 
be removed under memory pressure before the JVM crashes. That may remove graphs 
from the history early, but is much preferable to the JVM crashing, in which 
case the graph is lost as well...

3. Long term: The graph should be archived somewhere else. Somthing like the 
History server used by Hadoop and Hive would be a good idea.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: YARN ITCases fail, master broken?

2015-01-24 Thread Stephan Ewen
Is this reproducible on a machine when you delete the .m2/repository
directory (local maven cache) ?

(I currently cannot try that because I am behind a rather low-bandwith
connection and would take very long to re-download all dependency artifacts)

On Sat, Jan 24, 2015 at 5:54 AM, Fabian Hueske  wrote:

> I just tried to build ("mvn clean install") on a fresh Ubuntu VM. Fails
> with the same exception as natively on MacOS.
> Something strange is going on...
>
> 2015-01-24 11:19 GMT+01:00 Fabian Hueske :
>
> > Thanks Robert! Sounds indeed like an environment problem.
> > Will run the tests again and send you the output.
> >
> > 2015-01-24 11:11 GMT+01:00 Robert Metzger :
> >
> >> Okay, the tests have finished on my local machine, and they passed. So
> it
> >> looks like an environment specific issue.
> >> Maybe the log helps me already to figure out whats the issue.
> >> We should make sure that our tests are passing on all platforms ;)
> >>
> >> On Sat, Jan 24, 2015 at 11:06 AM, Robert Metzger 
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > the tests are passing on travis. Maybe its a issue with your
> >> environment.
> >> > I'm currently running the tests on my machine as well, just to make
> >> sure.
> >> > I haven't ran the tests on OS X, maybe that's causing the issues.
> >> >
> >> > Can you send me (privately) the full output of the tests?
> >> >
> >> > Best,
> >> > Robert
> >> >
> >> >
> >> >
> >> > On Sat, Jan 24, 2015 at 11:00 AM, Fabian Hueske 
> >> wrote:
> >> >
> >> >> Hi Henry,
> >> >>
> >> >> running "mvn -DskipTests clean install" before "mvn clean install"
> did
> >> not
> >> >> fix the build for me.
> >> >> The failing tests are also integration tests (*ITCase) which are only
> >> >> executed in Maven's verify phase which is not triggered if you run
> "mvn
> >> >> clean test".
> >> >> If I run "mvn test" without "mvn install" it fails for me as well
> with
> >> the
> >> >> error you posted.
> >> >>
> >> >> So there seem to be at least two build issues with the current
> master.
> >> >>
> >> >> 2015-01-24 1:47 GMT+01:00 Henry Saputra :
> >> >>
> >> >> > Hmm, I think there could be some weird dependencies to get the
> Flink
> >> >> > YARN uber jar.
> >> >> >
> >> >> > If you do "mvn clean install -DskipTests" then call "mvn test" all
> >> the
> >> >> > tests passed.
> >> >> >
> >> >> > But if you directly call "mvn clean test" then you see the stack I
> >> >> > have seen before.
> >> >> >
> >> >> > - Henry
> >> >> >
> >> >> >
> >> >> > On Fri, Jan 23, 2015 at 3:35 PM, Henry Saputra <
> >> henry.sapu...@gmail.com
> >> >> >
> >> >> > wrote:
> >> >> > > Did not see that trace but do see this:
> >> >> > >
> >> >> > > ---
> >> >> > >
> >> >> > >  T E S T S
> >> >> > >
> >> >> > > ---
> >> >> > >
> >> >> > > Running org.apache.flink.yarn.UtilsTest
> >> >> > >
> >> >> > > log4j:WARN No such property [append] in
> >> >> org.apache.log4j.ConsoleAppender.
> >> >> > >
> >> >> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
> >> 0.476
> >> >> > > sec <<< FAILURE! - in org.apache.flink.yarn.UtilsTest
> >> >> > >
> >> >> > > testUberjarLocator(org.apache.flink.yarn.UtilsTest)  Time
> elapsed:
> >> >> > > 0.405 sec  <<< FAILURE!
> >> >> > >
> >> >> > > java.lang.AssertionError: null
> >> >> > >
> >> >> > > at org.junit.Assert.fail(Assert.java:86)
> >> >> > >
> >> >> > > at org.junit.Assert.assertTrue(Assert.java:41)
> >> >> > >
> >> >> > > at org.junit.Assert.assertNotNull(Assert.java:621)
> >> >> > >
> >> >> > > at org.junit.Assert.assertNotNull(Assert.java:631)
> >> >> > >
> >> >> > > at
> >> >> org.apache.flink.yarn.UtilsTest.testUberjarLocator(UtilsTest.java:32)
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > Results :
> >> >> > >
> >> >> > >
> >> >> > > Failed tests:
> >> >> > >
> >> >> > >   UtilsTest.testUberjarLocator:32 null
> >> >> > >
> >> >> > >
> >> >> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
> >> >> > >
> >> >> > >
> >> >> > > - Henry
> >> >> > >
> >> >> > > On Fri, Jan 23, 2015 at 2:16 PM, Fabian Hueske <
> fhue...@gmail.com>
> >> >> > wrote:
> >> >> > >> Hi all,
> >> >> > >>
> >> >> > >> I tried to build the current master (mvn clean install) and some
> >> >> tests
> >> >> > in
> >> >> > >> the flink-yarn-tests module fail:
> >> >> > >>
> >> >> > >> Failed tests:
> >> >> > >>
> >> >> > >>
> >> >> > >>
> >> >> >
> >> >>
> >>
> YARNSessionCapacitySchedulerITCase.testClientStartup:50->YarnTestBase.runWithArgs:314
> >> >> > >> During the timeout period of 60 seconds the expected string did
> >> not
> >> >> > show up
> >> >> > >>
> >> >> > >>
> >> >>
> YARNSessionCapacitySchedulerITCase>YarnTestBase.checkClusterEmpty:146
> >> >> > >> There is at least one application on the cluster is not finished
> >> >> > >>
> >> >> > >>
> >> >> >
> >> >>
> >>
> YARNSessionFIFOITCase.perJobYarnCluster:184->YarnTestBase.runWithArgs:314
> >> >> > >> During the timeout period of 60 s

Re: Warning message in Scala type analysis

2015-01-24 Thread Stephan Ewen
Not 100%

My guess is that it comes from the scala tests in flink-tests for POJOs
containing joda time classes (to test the custom serializers)

Stephan


On Sat, Jan 24, 2015 at 12:16 PM, Aljoscha Krettek 
wrote:

> Yes, I will look into it.
>
> Are you sure this happens in the Scala code?
>
> On Sat, Jan 24, 2015 at 8:57 PM, Stephan Ewen  wrote:
> > Hi!
> >
> > When running a recent build, I am seeing the following error message in
> the
> > "flink-tests" project.
> >
> > [WARNING] warning: Class org.joda.convert.ToString not found - continuing
> > with a stub.
> >
> > @aljoscha This is probably a message generated by the Scala type
> analyzer.
> > Can you elaborate what this means?
> >
> > Greetings,
> > Stephan
>


Re: Warning message in Scala type analysis

2015-01-24 Thread Aljoscha Krettek
Yes, I will look into it.

Are you sure this happens in the Scala code?

On Sat, Jan 24, 2015 at 8:57 PM, Stephan Ewen  wrote:
> Hi!
>
> When running a recent build, I am seeing the following error message in the
> "flink-tests" project.
>
> [WARNING] warning: Class org.joda.convert.ToString not found - continuing
> with a stub.
>
> @aljoscha This is probably a message generated by the Scala type analyzer.
> Can you elaborate what this means?
>
> Greetings,
> Stephan


Warning message in Scala type analysis

2015-01-24 Thread Stephan Ewen
Hi!

When running a recent build, I am seeing the following error message in the
"flink-tests" project.

[WARNING] warning: Class org.joda.convert.ToString not found - continuing
with a stub.

@aljoscha This is probably a message generated by the Scala type analyzer.
Can you elaborate what this means?

Greetings,
Stephan


Cool project: H2O on Flink

2015-01-24 Thread Kostas Tzoumas
Hi everyone,

I had a chat with some folks behind the H2O project (http://h2o.ai), and
they would be interested in having H2O run on top/inside of Flink. H2O is a
very performant system focused on Machine Learning.

A similar integration has been implemented for H2O on Spark (called
sparkling water - https://github.com/h2oai/sparkling-water).

This would be a very cool project for someone that is interested in getting
started with Flink, and it should not be too hard to get started by
following sparkling water as a blueprint.

So if someone wants to jump on that, it would be great! Michal, the creator
of sparkling water is also willing to guide/give advice.


Re: Adding non-core API features to Flink

2015-01-24 Thread Kostas Tzoumas
Thanks Fabian for starting the discussion.

I would be biased towards option (1) that Stephan highlighted for the
following reasons:

- A separate github project is one more infrastructure to manage, and it
lives outside the ASF. I would like to bring as much code as possible to
the Apache Software Foundation, and not divide the codebase into two
separate repositories.

- The personal gratification (and thus motivation) is higher when
contributing to a top-level Apache project than a github repository
slightly associated with an ASF project. And contributors to the Flink
project get karma that may lead to new committers, which is crucial as the
project is growing.

Of course, non Apache-licensed contributions cannot be accepted. If we have
a good amount of those, we can start an infrastructure for Flink packages
that lives outside the ASF, but I would wait for the need to come before
doing this.

My proposal would be to funnel contributions to the main repository (in a
flink-contrib module) for now, including the recent contributions.

Kostas


On Sat, Jan 24, 2015 at 11:15 AM, Stephan Ewen  wrote:

> Yes, a "flink-contrib" project would be great.
>
> We have two options:
>
> 1) Make it part of the flink apache project.
>   - PRO this makes it easy to get stuff for users
>   - CONTRA this means stronger requirements on the code, blocker for code
> that uses dependencies under certain licenses, etc.
>
> 2) Make an independent github project.
>  - PRO contributions can depended on more licenses, such as LGPL
>  - PRO we can have more people that commit to this repo, committers can be
> different from flink committers
>  - CONTRA people need to grab the extensions from a different location
>
>
> I am slightly biased towards (2), but open to both.
>
> Stephan
>
>
>
>
> On Sat, Jan 24, 2015 at 5:29 AM, Chiwan Park 
> wrote:
>
> > I think top level maven module called "flink-contrib" is reasonable.
> There
> > are other projects having contrib package such as Akka, Django.
> >
> > Regards, Chiwan Park (Sent with iPhone)
> >
> > 2015. 1. 24. 오후 7:15 Fabian Hueske  작성:
> >
> > > Hi all,
> > >
> > > we got a few contribution requests lately to add cool but "non-core"
> > > features to our API.
> > > In previous discussions, concerns were raised to not bloat the APIs
> with
> > > too many "shortcut", "syntactic sugar", or special-case features.
> > >
> > > Instead we could setup a place to add Input/OutputFormats, common
> > > operations, etc. which does not need as much control as the core APIs.
> > Open
> > > questions are:
> > > - How do we organize it? (top-level maven module, modules in
> flink-java,
> > > flink-scala, java packages in the API modules, ...)
> > > - How do we name it? flink-utils, flink-packages, ...
> > >
> > > Any opinions on this?
> > >
> > > Cheers, Fabian
> >
>


Re: Adding non-core API features to Flink

2015-01-24 Thread Stephan Ewen
Yes, a "flink-contrib" project would be great.

We have two options:

1) Make it part of the flink apache project.
  - PRO this makes it easy to get stuff for users
  - CONTRA this means stronger requirements on the code, blocker for code
that uses dependencies under certain licenses, etc.

2) Make an independent github project.
 - PRO contributions can depended on more licenses, such as LGPL
 - PRO we can have more people that commit to this repo, committers can be
different from flink committers
 - CONTRA people need to grab the extensions from a different location


I am slightly biased towards (2), but open to both.

Stephan




On Sat, Jan 24, 2015 at 5:29 AM, Chiwan Park  wrote:

> I think top level maven module called "flink-contrib" is reasonable. There
> are other projects having contrib package such as Akka, Django.
>
> Regards, Chiwan Park (Sent with iPhone)
>
> 2015. 1. 24. 오후 7:15 Fabian Hueske  작성:
>
> > Hi all,
> >
> > we got a few contribution requests lately to add cool but "non-core"
> > features to our API.
> > In previous discussions, concerns were raised to not bloat the APIs with
> > too many "shortcut", "syntactic sugar", or special-case features.
> >
> > Instead we could setup a place to add Input/OutputFormats, common
> > operations, etc. which does not need as much control as the core APIs.
> Open
> > questions are:
> > - How do we organize it? (top-level maven module, modules in flink-java,
> > flink-scala, java packages in the API modules, ...)
> > - How do we name it? flink-utils, flink-packages, ...
> >
> > Any opinions on this?
> >
> > Cheers, Fabian
>


Re: YARN ITCases fail, master broken?

2015-01-24 Thread Fabian Hueske
I just tried to build ("mvn clean install") on a fresh Ubuntu VM. Fails
with the same exception as natively on MacOS.
Something strange is going on...

2015-01-24 11:19 GMT+01:00 Fabian Hueske :

> Thanks Robert! Sounds indeed like an environment problem.
> Will run the tests again and send you the output.
>
> 2015-01-24 11:11 GMT+01:00 Robert Metzger :
>
>> Okay, the tests have finished on my local machine, and they passed. So it
>> looks like an environment specific issue.
>> Maybe the log helps me already to figure out whats the issue.
>> We should make sure that our tests are passing on all platforms ;)
>>
>> On Sat, Jan 24, 2015 at 11:06 AM, Robert Metzger 
>> wrote:
>>
>> > Hi,
>> >
>> > the tests are passing on travis. Maybe its a issue with your
>> environment.
>> > I'm currently running the tests on my machine as well, just to make
>> sure.
>> > I haven't ran the tests on OS X, maybe that's causing the issues.
>> >
>> > Can you send me (privately) the full output of the tests?
>> >
>> > Best,
>> > Robert
>> >
>> >
>> >
>> > On Sat, Jan 24, 2015 at 11:00 AM, Fabian Hueske 
>> wrote:
>> >
>> >> Hi Henry,
>> >>
>> >> running "mvn -DskipTests clean install" before "mvn clean install" did
>> not
>> >> fix the build for me.
>> >> The failing tests are also integration tests (*ITCase) which are only
>> >> executed in Maven's verify phase which is not triggered if you run "mvn
>> >> clean test".
>> >> If I run "mvn test" without "mvn install" it fails for me as well with
>> the
>> >> error you posted.
>> >>
>> >> So there seem to be at least two build issues with the current master.
>> >>
>> >> 2015-01-24 1:47 GMT+01:00 Henry Saputra :
>> >>
>> >> > Hmm, I think there could be some weird dependencies to get the Flink
>> >> > YARN uber jar.
>> >> >
>> >> > If you do "mvn clean install -DskipTests" then call "mvn test" all
>> the
>> >> > tests passed.
>> >> >
>> >> > But if you directly call "mvn clean test" then you see the stack I
>> >> > have seen before.
>> >> >
>> >> > - Henry
>> >> >
>> >> >
>> >> > On Fri, Jan 23, 2015 at 3:35 PM, Henry Saputra <
>> henry.sapu...@gmail.com
>> >> >
>> >> > wrote:
>> >> > > Did not see that trace but do see this:
>> >> > >
>> >> > > ---
>> >> > >
>> >> > >  T E S T S
>> >> > >
>> >> > > ---
>> >> > >
>> >> > > Running org.apache.flink.yarn.UtilsTest
>> >> > >
>> >> > > log4j:WARN No such property [append] in
>> >> org.apache.log4j.ConsoleAppender.
>> >> > >
>> >> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
>> 0.476
>> >> > > sec <<< FAILURE! - in org.apache.flink.yarn.UtilsTest
>> >> > >
>> >> > > testUberjarLocator(org.apache.flink.yarn.UtilsTest)  Time elapsed:
>> >> > > 0.405 sec  <<< FAILURE!
>> >> > >
>> >> > > java.lang.AssertionError: null
>> >> > >
>> >> > > at org.junit.Assert.fail(Assert.java:86)
>> >> > >
>> >> > > at org.junit.Assert.assertTrue(Assert.java:41)
>> >> > >
>> >> > > at org.junit.Assert.assertNotNull(Assert.java:621)
>> >> > >
>> >> > > at org.junit.Assert.assertNotNull(Assert.java:631)
>> >> > >
>> >> > > at
>> >> org.apache.flink.yarn.UtilsTest.testUberjarLocator(UtilsTest.java:32)
>> >> > >
>> >> > >
>> >> > >
>> >> > > Results :
>> >> > >
>> >> > >
>> >> > > Failed tests:
>> >> > >
>> >> > >   UtilsTest.testUberjarLocator:32 null
>> >> > >
>> >> > >
>> >> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
>> >> > >
>> >> > >
>> >> > > - Henry
>> >> > >
>> >> > > On Fri, Jan 23, 2015 at 2:16 PM, Fabian Hueske 
>> >> > wrote:
>> >> > >> Hi all,
>> >> > >>
>> >> > >> I tried to build the current master (mvn clean install) and some
>> >> tests
>> >> > in
>> >> > >> the flink-yarn-tests module fail:
>> >> > >>
>> >> > >> Failed tests:
>> >> > >>
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> YARNSessionCapacitySchedulerITCase.testClientStartup:50->YarnTestBase.runWithArgs:314
>> >> > >> During the timeout period of 60 seconds the expected string did
>> not
>> >> > show up
>> >> > >>
>> >> > >>
>> >>  YARNSessionCapacitySchedulerITCase>YarnTestBase.checkClusterEmpty:146
>> >> > >> There is at least one application on the cluster is not finished
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> YARNSessionFIFOITCase.perJobYarnCluster:184->YarnTestBase.runWithArgs:314
>> >> > >> During the timeout period of 60 seconds the expected string did
>> not
>> >> > show up
>> >> > >>
>> >> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There
>> is
>> >> at
>> >> > >> least one application on the cluster is not finished
>> >> > >>
>> >> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There
>> is
>> >> at
>> >> > >> least one application on the cluster is not finished
>> >> > >>
>> >> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There
>> is
>> >> at
>> >> > >> least one application on the cluster is not finished
>> >> > >>
>> >> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There
>

Re: Adding non-core API features to Flink

2015-01-24 Thread Chiwan Park
I think top level maven module called "flink-contrib" is reasonable. There are 
other projects having contrib package such as Akka, Django.

Regards, Chiwan Park (Sent with iPhone)

2015. 1. 24. 오후 7:15 Fabian Hueske  작성:

> Hi all,
> 
> we got a few contribution requests lately to add cool but "non-core"
> features to our API.
> In previous discussions, concerns were raised to not bloat the APIs with
> too many "shortcut", "syntactic sugar", or special-case features.
> 
> Instead we could setup a place to add Input/OutputFormats, common
> operations, etc. which does not need as much control as the core APIs. Open
> questions are:
> - How do we organize it? (top-level maven module, modules in flink-java,
> flink-scala, java packages in the API modules, ...)
> - How do we name it? flink-utils, flink-packages, ...
> 
> Any opinions on this?
> 
> Cheers, Fabian


[jira] [Created] (FLINK-1441) Documentation SVN checkout link is wrong

2015-01-24 Thread Rahul Mahindrakar (JIRA)
Rahul Mahindrakar created FLINK-1441:


 Summary: Documentation SVN checkout link is wrong
 Key: FLINK-1441
 URL: https://issues.apache.org/jira/browse/FLINK-1441
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Rahul Mahindrakar
Priority: Trivial


In the documentation over here 

http://flink.apache.org/how-to-contribute.html it states 
--
To make changes to the website, you have to checkout the source code of it 
first:

svn checkout https://svn.apache.org/repos/asf/incubator/flink/
cd flink
--
when I try this it states.  

hduser@ubuntu:~/Desktop/Flink/website$ svn checkout 
https://svn.apache.org/repos/asf/incubator/flink/
svn: E175011: Repository moved permanently to 
'https://svn.apache.org/repos/asf/flink'; please relocate

Though there is clear message on what to do I would like to have a first try to 
change documentation and get to understand the process of contribution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: YARN ITCases fail, master broken?

2015-01-24 Thread Fabian Hueske
Thanks Robert! Sounds indeed like an environment problem.
Will run the tests again and send you the output.

2015-01-24 11:11 GMT+01:00 Robert Metzger :

> Okay, the tests have finished on my local machine, and they passed. So it
> looks like an environment specific issue.
> Maybe the log helps me already to figure out whats the issue.
> We should make sure that our tests are passing on all platforms ;)
>
> On Sat, Jan 24, 2015 at 11:06 AM, Robert Metzger 
> wrote:
>
> > Hi,
> >
> > the tests are passing on travis. Maybe its a issue with your environment.
> > I'm currently running the tests on my machine as well, just to make sure.
> > I haven't ran the tests on OS X, maybe that's causing the issues.
> >
> > Can you send me (privately) the full output of the tests?
> >
> > Best,
> > Robert
> >
> >
> >
> > On Sat, Jan 24, 2015 at 11:00 AM, Fabian Hueske 
> wrote:
> >
> >> Hi Henry,
> >>
> >> running "mvn -DskipTests clean install" before "mvn clean install" did
> not
> >> fix the build for me.
> >> The failing tests are also integration tests (*ITCase) which are only
> >> executed in Maven's verify phase which is not triggered if you run "mvn
> >> clean test".
> >> If I run "mvn test" without "mvn install" it fails for me as well with
> the
> >> error you posted.
> >>
> >> So there seem to be at least two build issues with the current master.
> >>
> >> 2015-01-24 1:47 GMT+01:00 Henry Saputra :
> >>
> >> > Hmm, I think there could be some weird dependencies to get the Flink
> >> > YARN uber jar.
> >> >
> >> > If you do "mvn clean install -DskipTests" then call "mvn test" all the
> >> > tests passed.
> >> >
> >> > But if you directly call "mvn clean test" then you see the stack I
> >> > have seen before.
> >> >
> >> > - Henry
> >> >
> >> >
> >> > On Fri, Jan 23, 2015 at 3:35 PM, Henry Saputra <
> henry.sapu...@gmail.com
> >> >
> >> > wrote:
> >> > > Did not see that trace but do see this:
> >> > >
> >> > > ---
> >> > >
> >> > >  T E S T S
> >> > >
> >> > > ---
> >> > >
> >> > > Running org.apache.flink.yarn.UtilsTest
> >> > >
> >> > > log4j:WARN No such property [append] in
> >> org.apache.log4j.ConsoleAppender.
> >> > >
> >> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
> 0.476
> >> > > sec <<< FAILURE! - in org.apache.flink.yarn.UtilsTest
> >> > >
> >> > > testUberjarLocator(org.apache.flink.yarn.UtilsTest)  Time elapsed:
> >> > > 0.405 sec  <<< FAILURE!
> >> > >
> >> > > java.lang.AssertionError: null
> >> > >
> >> > > at org.junit.Assert.fail(Assert.java:86)
> >> > >
> >> > > at org.junit.Assert.assertTrue(Assert.java:41)
> >> > >
> >> > > at org.junit.Assert.assertNotNull(Assert.java:621)
> >> > >
> >> > > at org.junit.Assert.assertNotNull(Assert.java:631)
> >> > >
> >> > > at
> >> org.apache.flink.yarn.UtilsTest.testUberjarLocator(UtilsTest.java:32)
> >> > >
> >> > >
> >> > >
> >> > > Results :
> >> > >
> >> > >
> >> > > Failed tests:
> >> > >
> >> > >   UtilsTest.testUberjarLocator:32 null
> >> > >
> >> > >
> >> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
> >> > >
> >> > >
> >> > > - Henry
> >> > >
> >> > > On Fri, Jan 23, 2015 at 2:16 PM, Fabian Hueske 
> >> > wrote:
> >> > >> Hi all,
> >> > >>
> >> > >> I tried to build the current master (mvn clean install) and some
> >> tests
> >> > in
> >> > >> the flink-yarn-tests module fail:
> >> > >>
> >> > >> Failed tests:
> >> > >>
> >> > >>
> >> > >>
> >> >
> >>
> YARNSessionCapacitySchedulerITCase.testClientStartup:50->YarnTestBase.runWithArgs:314
> >> > >> During the timeout period of 60 seconds the expected string did not
> >> > show up
> >> > >>
> >> > >>
> >>  YARNSessionCapacitySchedulerITCase>YarnTestBase.checkClusterEmpty:146
> >> > >> There is at least one application on the cluster is not finished
> >> > >>
> >> > >>
> >> >
> >>
> YARNSessionFIFOITCase.perJobYarnCluster:184->YarnTestBase.runWithArgs:314
> >> > >> During the timeout period of 60 seconds the expected string did not
> >> > show up
> >> > >>
> >> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
> >> at
> >> > >> least one application on the cluster is not finished
> >> > >>
> >> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
> >> at
> >> > >> least one application on the cluster is not finished
> >> > >>
> >> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
> >> at
> >> > >> least one application on the cluster is not finished
> >> > >>
> >> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
> >> at
> >> > >> least one application on the cluster is not finished
> >> > >>
> >> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
> >> at
> >> > >> least one application on the cluster is not finished
> >> > >>
> >> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
> >> at
> >> > >> least one application on the cluster 

Adding non-core API features to Flink

2015-01-24 Thread Fabian Hueske
Hi all,

we got a few contribution requests lately to add cool but "non-core"
features to our API.
In previous discussions, concerns were raised to not bloat the APIs with
too many "shortcut", "syntactic sugar", or special-case features.

Instead we could setup a place to add Input/OutputFormats, common
operations, etc. which does not need as much control as the core APIs. Open
questions are:
- How do we organize it? (top-level maven module, modules in flink-java,
flink-scala, java packages in the API modules, ...)
- How do we name it? flink-utils, flink-packages, ...

Any opinions on this?

Cheers, Fabian


Re: YARN ITCases fail, master broken?

2015-01-24 Thread Robert Metzger
Okay, the tests have finished on my local machine, and they passed. So it
looks like an environment specific issue.
Maybe the log helps me already to figure out whats the issue.
We should make sure that our tests are passing on all platforms ;)

On Sat, Jan 24, 2015 at 11:06 AM, Robert Metzger 
wrote:

> Hi,
>
> the tests are passing on travis. Maybe its a issue with your environment.
> I'm currently running the tests on my machine as well, just to make sure.
> I haven't ran the tests on OS X, maybe that's causing the issues.
>
> Can you send me (privately) the full output of the tests?
>
> Best,
> Robert
>
>
>
> On Sat, Jan 24, 2015 at 11:00 AM, Fabian Hueske  wrote:
>
>> Hi Henry,
>>
>> running "mvn -DskipTests clean install" before "mvn clean install" did not
>> fix the build for me.
>> The failing tests are also integration tests (*ITCase) which are only
>> executed in Maven's verify phase which is not triggered if you run "mvn
>> clean test".
>> If I run "mvn test" without "mvn install" it fails for me as well with the
>> error you posted.
>>
>> So there seem to be at least two build issues with the current master.
>>
>> 2015-01-24 1:47 GMT+01:00 Henry Saputra :
>>
>> > Hmm, I think there could be some weird dependencies to get the Flink
>> > YARN uber jar.
>> >
>> > If you do "mvn clean install -DskipTests" then call "mvn test" all the
>> > tests passed.
>> >
>> > But if you directly call "mvn clean test" then you see the stack I
>> > have seen before.
>> >
>> > - Henry
>> >
>> >
>> > On Fri, Jan 23, 2015 at 3:35 PM, Henry Saputra > >
>> > wrote:
>> > > Did not see that trace but do see this:
>> > >
>> > > ---
>> > >
>> > >  T E S T S
>> > >
>> > > ---
>> > >
>> > > Running org.apache.flink.yarn.UtilsTest
>> > >
>> > > log4j:WARN No such property [append] in
>> org.apache.log4j.ConsoleAppender.
>> > >
>> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.476
>> > > sec <<< FAILURE! - in org.apache.flink.yarn.UtilsTest
>> > >
>> > > testUberjarLocator(org.apache.flink.yarn.UtilsTest)  Time elapsed:
>> > > 0.405 sec  <<< FAILURE!
>> > >
>> > > java.lang.AssertionError: null
>> > >
>> > > at org.junit.Assert.fail(Assert.java:86)
>> > >
>> > > at org.junit.Assert.assertTrue(Assert.java:41)
>> > >
>> > > at org.junit.Assert.assertNotNull(Assert.java:621)
>> > >
>> > > at org.junit.Assert.assertNotNull(Assert.java:631)
>> > >
>> > > at
>> org.apache.flink.yarn.UtilsTest.testUberjarLocator(UtilsTest.java:32)
>> > >
>> > >
>> > >
>> > > Results :
>> > >
>> > >
>> > > Failed tests:
>> > >
>> > >   UtilsTest.testUberjarLocator:32 null
>> > >
>> > >
>> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
>> > >
>> > >
>> > > - Henry
>> > >
>> > > On Fri, Jan 23, 2015 at 2:16 PM, Fabian Hueske 
>> > wrote:
>> > >> Hi all,
>> > >>
>> > >> I tried to build the current master (mvn clean install) and some
>> tests
>> > in
>> > >> the flink-yarn-tests module fail:
>> > >>
>> > >> Failed tests:
>> > >>
>> > >>
>> > >>
>> >
>> YARNSessionCapacitySchedulerITCase.testClientStartup:50->YarnTestBase.runWithArgs:314
>> > >> During the timeout period of 60 seconds the expected string did not
>> > show up
>> > >>
>> > >>
>>  YARNSessionCapacitySchedulerITCase>YarnTestBase.checkClusterEmpty:146
>> > >> There is at least one application on the cluster is not finished
>> > >>
>> > >>
>> >
>> YARNSessionFIFOITCase.perJobYarnCluster:184->YarnTestBase.runWithArgs:314
>> > >> During the timeout period of 60 seconds the expected string did not
>> > show up
>> > >>
>> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
>> at
>> > >> least one application on the cluster is not finished
>> > >>
>> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
>> at
>> > >> least one application on the cluster is not finished
>> > >>
>> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
>> at
>> > >> least one application on the cluster is not finished
>> > >>
>> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
>> at
>> > >> least one application on the cluster is not finished
>> > >>
>> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
>> at
>> > >> least one application on the cluster is not finished
>> > >>
>> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
>> at
>> > >> least one application on the cluster is not finished
>> > >>
>> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is
>> at
>> > >> least one application on the cluster is not finished
>> > >>
>> > >>
>> > >> Tests run: 10, Failures: 10, Errors: 0, Skipped: 0
>> > >>
>> > >> Anybody else got this problem?
>> > >>
>> > >> Cheers, Fabian
>> >
>>
>
>


Re: Tweets Custom Input Format

2015-01-24 Thread Fabian Hueske
Hi Mustafa,

that would be a nice contribution!

We are currently discussing how to add "non-core" API features into Flink
[1].
I will move this discussion onto the mailing list to decide where to add
cool add-ons like yours.

Cheers, Fabian

[1] https://issues.apache.org/jira/browse/FLINK-1398

2015-01-23 20:42 GMT+01:00 Henry Saputra :

> Contributions are welcomed!
>
> Here is the link on how to contribute to Apache Flink:
> http://flink.apache.org/how-to-contribute.html
>
> You can start by creating JIRA ticket [1] to help describe what you
> wanted to do and to get feedback from community.
>
>
> - Henry
>
> [1] https://issues.apache.org/jira/secure/Dashboard.jspa
>
> On Fri, Jan 23, 2015 at 10:54 AM, Mustafa Elbehery
>  wrote:
> > Hi,
> >
> > I have created a custom InputFormat for tweets on Flink, based on
> > JSON-Simple event driven parser. I would like to contribute my work into
> > Flink,
> >
> > Regards.
> >
> > --
> > Mustafa Elbehery
> > EIT ICT Labs Master School 
> > +49(0)15218676094
> > skype: mustafaelbehery87
>


Re: YARN ITCases fail, master broken?

2015-01-24 Thread Robert Metzger
Hi,

the tests are passing on travis. Maybe its a issue with your environment.
I'm currently running the tests on my machine as well, just to make sure.
I haven't ran the tests on OS X, maybe that's causing the issues.

Can you send me (privately) the full output of the tests?

Best,
Robert



On Sat, Jan 24, 2015 at 11:00 AM, Fabian Hueske  wrote:

> Hi Henry,
>
> running "mvn -DskipTests clean install" before "mvn clean install" did not
> fix the build for me.
> The failing tests are also integration tests (*ITCase) which are only
> executed in Maven's verify phase which is not triggered if you run "mvn
> clean test".
> If I run "mvn test" without "mvn install" it fails for me as well with the
> error you posted.
>
> So there seem to be at least two build issues with the current master.
>
> 2015-01-24 1:47 GMT+01:00 Henry Saputra :
>
> > Hmm, I think there could be some weird dependencies to get the Flink
> > YARN uber jar.
> >
> > If you do "mvn clean install -DskipTests" then call "mvn test" all the
> > tests passed.
> >
> > But if you directly call "mvn clean test" then you see the stack I
> > have seen before.
> >
> > - Henry
> >
> >
> > On Fri, Jan 23, 2015 at 3:35 PM, Henry Saputra 
> > wrote:
> > > Did not see that trace but do see this:
> > >
> > > ---
> > >
> > >  T E S T S
> > >
> > > ---
> > >
> > > Running org.apache.flink.yarn.UtilsTest
> > >
> > > log4j:WARN No such property [append] in
> org.apache.log4j.ConsoleAppender.
> > >
> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.476
> > > sec <<< FAILURE! - in org.apache.flink.yarn.UtilsTest
> > >
> > > testUberjarLocator(org.apache.flink.yarn.UtilsTest)  Time elapsed:
> > > 0.405 sec  <<< FAILURE!
> > >
> > > java.lang.AssertionError: null
> > >
> > > at org.junit.Assert.fail(Assert.java:86)
> > >
> > > at org.junit.Assert.assertTrue(Assert.java:41)
> > >
> > > at org.junit.Assert.assertNotNull(Assert.java:621)
> > >
> > > at org.junit.Assert.assertNotNull(Assert.java:631)
> > >
> > > at
> org.apache.flink.yarn.UtilsTest.testUberjarLocator(UtilsTest.java:32)
> > >
> > >
> > >
> > > Results :
> > >
> > >
> > > Failed tests:
> > >
> > >   UtilsTest.testUberjarLocator:32 null
> > >
> > >
> > > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
> > >
> > >
> > > - Henry
> > >
> > > On Fri, Jan 23, 2015 at 2:16 PM, Fabian Hueske 
> > wrote:
> > >> Hi all,
> > >>
> > >> I tried to build the current master (mvn clean install) and some tests
> > in
> > >> the flink-yarn-tests module fail:
> > >>
> > >> Failed tests:
> > >>
> > >>
> > >>
> >
> YARNSessionCapacitySchedulerITCase.testClientStartup:50->YarnTestBase.runWithArgs:314
> > >> During the timeout period of 60 seconds the expected string did not
> > show up
> > >>
> > >>
>  YARNSessionCapacitySchedulerITCase>YarnTestBase.checkClusterEmpty:146
> > >> There is at least one application on the cluster is not finished
> > >>
> > >>
> >
> YARNSessionFIFOITCase.perJobYarnCluster:184->YarnTestBase.runWithArgs:314
> > >> During the timeout period of 60 seconds the expected string did not
> > show up
> > >>
> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> > >> least one application on the cluster is not finished
> > >>
> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> > >> least one application on the cluster is not finished
> > >>
> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> > >> least one application on the cluster is not finished
> > >>
> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> > >> least one application on the cluster is not finished
> > >>
> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> > >> least one application on the cluster is not finished
> > >>
> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> > >> least one application on the cluster is not finished
> > >>
> > >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> > >> least one application on the cluster is not finished
> > >>
> > >>
> > >> Tests run: 10, Failures: 10, Errors: 0, Skipped: 0
> > >>
> > >> Anybody else got this problem?
> > >>
> > >> Cheers, Fabian
> >
>


Re: YARN ITCases fail, master broken?

2015-01-24 Thread Fabian Hueske
Hi Henry,

running "mvn -DskipTests clean install" before "mvn clean install" did not
fix the build for me.
The failing tests are also integration tests (*ITCase) which are only
executed in Maven's verify phase which is not triggered if you run "mvn
clean test".
If I run "mvn test" without "mvn install" it fails for me as well with the
error you posted.

So there seem to be at least two build issues with the current master.

2015-01-24 1:47 GMT+01:00 Henry Saputra :

> Hmm, I think there could be some weird dependencies to get the Flink
> YARN uber jar.
>
> If you do "mvn clean install -DskipTests" then call "mvn test" all the
> tests passed.
>
> But if you directly call "mvn clean test" then you see the stack I
> have seen before.
>
> - Henry
>
>
> On Fri, Jan 23, 2015 at 3:35 PM, Henry Saputra 
> wrote:
> > Did not see that trace but do see this:
> >
> > ---
> >
> >  T E S T S
> >
> > ---
> >
> > Running org.apache.flink.yarn.UtilsTest
> >
> > log4j:WARN No such property [append] in org.apache.log4j.ConsoleAppender.
> >
> > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.476
> > sec <<< FAILURE! - in org.apache.flink.yarn.UtilsTest
> >
> > testUberjarLocator(org.apache.flink.yarn.UtilsTest)  Time elapsed:
> > 0.405 sec  <<< FAILURE!
> >
> > java.lang.AssertionError: null
> >
> > at org.junit.Assert.fail(Assert.java:86)
> >
> > at org.junit.Assert.assertTrue(Assert.java:41)
> >
> > at org.junit.Assert.assertNotNull(Assert.java:621)
> >
> > at org.junit.Assert.assertNotNull(Assert.java:631)
> >
> > at org.apache.flink.yarn.UtilsTest.testUberjarLocator(UtilsTest.java:32)
> >
> >
> >
> > Results :
> >
> >
> > Failed tests:
> >
> >   UtilsTest.testUberjarLocator:32 null
> >
> >
> > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
> >
> >
> > - Henry
> >
> > On Fri, Jan 23, 2015 at 2:16 PM, Fabian Hueske 
> wrote:
> >> Hi all,
> >>
> >> I tried to build the current master (mvn clean install) and some tests
> in
> >> the flink-yarn-tests module fail:
> >>
> >> Failed tests:
> >>
> >>
> >>
> YARNSessionCapacitySchedulerITCase.testClientStartup:50->YarnTestBase.runWithArgs:314
> >> During the timeout period of 60 seconds the expected string did not
> show up
> >>
> >>   YARNSessionCapacitySchedulerITCase>YarnTestBase.checkClusterEmpty:146
> >> There is at least one application on the cluster is not finished
> >>
> >>
>  YARNSessionFIFOITCase.perJobYarnCluster:184->YarnTestBase.runWithArgs:314
> >> During the timeout period of 60 seconds the expected string did not
> show up
> >>
> >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> >> least one application on the cluster is not finished
> >>
> >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> >> least one application on the cluster is not finished
> >>
> >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> >> least one application on the cluster is not finished
> >>
> >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> >> least one application on the cluster is not finished
> >>
> >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> >> least one application on the cluster is not finished
> >>
> >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> >> least one application on the cluster is not finished
> >>
> >>   YARNSessionFIFOITCase>YarnTestBase.checkClusterEmpty:146 There is at
> >> least one application on the cluster is not finished
> >>
> >>
> >> Tests run: 10, Failures: 10, Errors: 0, Skipped: 0
> >>
> >> Anybody else got this problem?
> >>
> >> Cheers, Fabian
>