[GitHub] flink issue #2337: [FLINK-3042] [FLINK-3060] [types] Define a way to let typ...

2016-10-08 Thread aalexandrov
Github user aalexandrov commented on the issue:

https://github.com/apache/flink/pull/2337
  
@greghogan I see, IMHO then the `1.0.0` tag should be dropped from the "Fix 
versions" field (if possible).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2337: [FLINK-3042] [FLINK-3060] [types] Define a way to let typ...

2016-10-08 Thread aalexandrov
Github user aalexandrov commented on the issue:

https://github.com/apache/flink/pull/2337
  
@twalthr The issue is marked with "Fix versions: 1.0.0, 1.2.0" but I could 
only fin the `TypeInfoFactory` in the `master` branch. Shouldn't this be 
visible in the `release-1.0` as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #1517: [FLINK-3477] [runtime] Add hash-based combine strategy fo...

2016-06-19 Thread aalexandrov
Github user aalexandrov commented on the issue:

https://github.com/apache/flink/pull/1517
  
@ggevay :+1: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3477] [runtime] Add hash-based combine ...

2016-04-10 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-208083609
  
We've summarized the use-case around the hash aggregation experiments in [a 
blog post on the Peel 
webpage](http://peel-framework.org/2016/04/07/hash-aggregations-in-flink.html).

If you follow the instructions from the **Repeatability** section you 
should be able to reproduce the results on other environments without too much 
hastle.

I hope that this will be the first of many Flink-related public Peel 
bundles.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3477] [runtime] Add hash-based combine ...

2016-03-29 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-202868272
  
Overall it seems that the hash-based combiner works better than the 
sort-based one for (a) uniform, or normal key distribution, and (b) 
fixed-length records.

For skewed key distribution (like Zipf) the two strategies are practically 
equal, and for variable-length record the extra effort in compacting the record 
offsets the advanges of the hash-based aggregation approach. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-3477] [runtime] Add hash-based combine ...

2016-03-29 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-202864630
  
@fhueske We have used the Easter break to conduct the experiments. A 
preliminary writeup is in the Google Doc. @ggevay will provide the results 
analysis later today. Cheers!  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2237] [runtime] Add hash-based combiner...

2016-02-22 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-187243415
  
I've added a [Google 
Doc](https://docs.google.com/document/d/12yx7olVrkooceaQPoR1nkk468lIq0xOObY5ukWuNEcM/edit?usp=sharing)
 where we can collaborate on the design of the experiments.

Once we're fixed on that, we will proceed by implementing them. The code 
[will be available in the `flink-hashagg-experiments` 
repository](https://github.com/TU-Berlin-DIMA/flink-hashagg-experiments).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2237] [runtime] Add hash-based combiner...

2016-02-19 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-186257996
  
@fhueske We will propose three experiments based on your suggestions in a 
Google Doc on Monday. Once we have fixed the setup we will prepare a Peel 
bundle an run them on one of the clusters in the lab. 

I would be also happy to promote Peel by leading a joint blog post for the 
Peel website together with @ggevay and you if you are interested. I think that 
the hash-table makes for a perfect use-case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2237] [runtime] Add hash-based combiner...

2016-02-19 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1517#issuecomment-186226154
  
@ggevay I would be interesting in helping you prepare a 
[`peel-flink-bundle`](http://peel-framework.org/getting-started.html) for the 
benchmarks @fhueske mentioned.

It will make for a perfect use-case for what Peel is intended and a nice 
first contribution to the [bundles 
repository](http://peel-framework.org/repository/bundles.html).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [docs] Fix documentation for building Flink wi...

2015-10-21 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1260#issuecomment-149848186
  
@tillrohrmann The list of TODOs from the JIRA issue is referring concretely 
to IntelliJ. Is there some Eclipse user who can see what the corresponding 
actions for Eclipse would be after I make a PR for that.

Also, I think that this belongs to a part of the documentation related to 
development.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [docs] Fix documentation for building Flink wi...

2015-10-21 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1260#issuecomment-149847793
  
:+1: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2809] [scala-api] Added UnitTypeInfo an...

2015-10-17 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/1217#issuecomment-148901986
  
> Not all methods without paremeters should translate to methods without 
parenthesis...

@StephanEwen I agree with that, but I cannot understand how the 
`UnitTypeInfo` might cause a confusion here.

The typeInformation macros are synthesized by the macro based on the 
inferred collection type, which means that the meaning of `()` is resolved 
before that. Consider the following example:

```scala
// in the Scala REPL

case class Foo(answer: Int)
// defined class Foo

def f1(): Foo = Foo(42)
// f1: ()Foo

def f2: Foo = Foo(42)
// f2: Foo

val xs = Seq(f1(), f2) // how a literate person would write it
// xs: Seq[Foo] = List(Foo(42), Foo(42))

val xs = Seq(f1, f2) // how a dazed & confused person would write it, but 
still compiles  
// xs: Seq[Foo] = List(Foo(42), Foo(42))

val xs = Seq(f1, f2()) // even worse, but this breaks with a compiler 
exception
// error: Foo does not take parameters
//   val xs = Seq(f1, f2())

val xs = Seq((), ()) // typing '()' without syntactic context resolves to 
Unit
// xs: Seq[Unit] = List((), ())
```

In all of the above situations `env.fromCollection(xs)` is (1) either going 
to typecheck and trigger `TypeInformation` synthesis or (2) fail with the above.

Can you point to StackOverflow conversation or something similar where the 
issue you mention is explained with an example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [docs] Fix documentation for building Flink wi...

2015-10-15 Thread aalexandrov
GitHub user aalexandrov opened a pull request:

https://github.com/apache/flink/pull/1260

[docs] Fix documentation for building Flink with Scala 2.11 or 2.10

This fixes some dead anchor links and aligns the text in "Build Flink for a 
specific Scala version" against the commands required by the current master.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aalexandrov/flink doc-scalabuild-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1260.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1260


commit fd86d72394b4dedaf6724398b041655825f0c7d4
Author: Alexander Alexandrov 
Date:   2015-10-15T22:21:23Z

[docs] Fix documentation for building Flink with Scala 2.11 or 2.10




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Removed broken dependency to flink-spargel.

2015-10-15 Thread aalexandrov
GitHub user aalexandrov opened a pull request:

https://github.com/apache/flink/pull/1259

Removed broken dependency to flink-spargel.

I think that this should have been removed as part of #1229.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aalexandrov/flink patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1259.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1259


commit dff19c4b27f201cfcbe104073005bc9aad2cbb45
Author: Alexander Alexandrov 
Date:   2015-10-15T21:18:39Z

Removed broken dependency to flink-spargel.

I think that this should have been removed as part of #1229.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...

2015-10-06 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-145984551
  
It could be that the test is not executed for Scala 2.11.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...

2015-10-06 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-145974916
  
@mxm The issue is probably related to Scala 2.10, since the two passing 
builds have `Dscala-2.11`. I'll investigate tonight if I can get a hold on the 
PR commits (GitHub is somewhat slow at the moment) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...

2015-09-24 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-142960455
  
@mxm The jars in which folder? The main motivation point of the pull 
request is to add the option to add a classpaths where one can generate code at 
runtime.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...

2015-09-18 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-141459804
  
@rmetzger the final 10.0, this and the Scala 2.11 compatibility are the two 
pending issues that make the current Emma master incompatible with vanilla 
Flink.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...

2015-09-18 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-141455406
  
@twalthr @rmetzger is there a chance to include this PR in the 10.0 release?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1297] Added OperatorStatsAccumulator fo...

2015-08-04 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/605#issuecomment-127558961
  
@StephanEwen To be honest I'm kind of puzzled and somewhat annoyed that a 
PR that

1) adds a feature that has been on the list for at least 2 years,
2) has been tested and evaluated thoroughly, and
3) has been lying around since April and was rebased at least three times

is still open. 

I was also not aware that there is a strict maintainer policy for 
`flink-contrib` commits. IMHO this is a good place to expose non-critical / 
exploratory / unstable features that might be of benefit for multiple users to 
the project. If these turn out the be useful for enough people, I'm sure there 
will be incentive to maintain them. Otherwise there is always the option to 
deprecate and cleanup recent additions to the contrib prior to the next release 
if nobody picked up on them. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2408] Define all maven properties outsi...

2015-07-27 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/941#issuecomment-125192692
  
I guess the "s" in the name is the giveaway...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2408] Define all maven properties outsi...

2015-07-27 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/941#issuecomment-125189988
  
If sbt can resolve profile-based dependencies this should be OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2408] Define all maven properties outsi...

2015-07-27 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/941#issuecomment-125182364
  
Then it should be good to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2408] Define all maven properties outsi...

2015-07-27 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/941#issuecomment-125180542
  
Is the removal of the property-based activation (`-Dscala-2.11`) also 
intentional? I think this might be used in the travis.yml.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...

2015-07-24 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/935#issuecomment-124582468
  
PS. The third commit fixes a compilation error in IntelliJ when the 
'scala_2.11' profile is active.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2231] Create a Serializer for Scala Enu...

2015-07-24 Thread aalexandrov
GitHub user aalexandrov opened a pull request:

https://github.com/apache/flink/pull/935

[FLINK-2231] Create a Serializer for Scala Enumerations.

This closes FLINK-2231.

The code should work for all objects which follow [the Enumeration idiom 
outlined in the 
ScalaDoc](http://www.scala-lang.org/api/2.11.5/index.html#scala.Enumeration).

The second commit removes the boilerplate code from the 
`EnumValueComparator` by delegating to an `IntComparator`, you can either 
discard or squash it while merging depending on your preference.

Bear in mind the FIXME at line 368 in `TypeAnalyzer.scala`. The commented 
code is better, but unfortunately doesn't work with Scala 2.10, so I used the 
FQN workaround.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aalexandrov/flink FLINK-2231

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/935.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #935


commit fd69bda383f6771e87ded1b4b595a395519efd6e
Author: Alexander Alexandrov 
Date:   2015-07-24T10:36:17Z

[FLINK-2231] Create a Serializer for Scala Enumerations.

commit dca03720d090c383f88a57af6808fdbfd2c4ec29
Author: Alexander Alexandrov 
Date:   2015-07-24T16:43:14Z

Delegating EnumValueComparator.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-10 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-120423639
  
Looks very neat!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-08 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-119622710
  
@rmetzger if you can hack your way with regex's in IntelliJ / Eclipse most 
of the work should be handled by the IDE.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2311] Fixes 'flink-*' dependency scope ...

2015-07-06 Thread aalexandrov
Github user aalexandrov closed the pull request at:

https://github.com/apache/flink/pull/880


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-03 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/885#discussion_r33858475
  
--- Diff: docs/apis/programming_guide.md ---
@@ -187,7 +187,17 @@ that creates the type information for Flink operations.
 
 
 
+ Scala Dependency Versions
 
+Because Scala 2.10 binary is not compatible with Scala 2.11 binary, we 
provide multiple artifacts
+to support both Scala versions. If you want to run your program on Flink 
with Scala 2.11, you need
+to add a suffix `_2.11` to all Flink artifact ids in your dependencies. 
You should be careful with
+this difference of artifact id. All modules with Scala 2.11 have a suffix 
`_2.11` in artifact id.
+For example, `flink-java` should be changed to `flink-java_2.11` and 
`flink-clients` should be
+changed to `flink-clients_2.11`.
--- End diff --

Maybe a list of all modules indicating which requires a suffix and which 
not will be helpful (or you can just link to  Maven central).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-03 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/885#discussion_r33858383
  
--- Diff: docs/apis/programming_guide.md ---
@@ -187,7 +187,17 @@ that creates the type information for Flink operations.
 
 
 
+ Scala Dependency Versions
 
+Because Scala 2.10 binary is not compatible with Scala 2.11 binary, we 
provide multiple artifacts
+to support both Scala versions. If you want to run your program on Flink 
with Scala 2.11, you need
+to add a suffix `_2.11` to all Flink artifact ids in your dependencies. 
You should be careful with
+this difference of artifact id. All modules with Scala 2.11 have a suffix 
`_2.11` in artifact id.
+For example, `flink-java` should be changed to `flink-java_2.11` and 
`flink-clients` should be
+changed to `flink-clients_2.11`.
--- End diff --

Because Scala 2.10 binary is not compatible with Scala 2.11 binary, we 
provide multiple artifacts
to support both Scala versions. 

Starting from the 0.10 line, we cross-build all Scala-dependent Flink 
modules for both 2.10 and 2.11.
If you want to run your program on Flink with Scala 2.11, you need to add a 
`_2.11` suffix to the `artifactId` values of the Scala-dependent Flink modules 
in your `dependencies` section.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-03 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-118326627
  
I think that a cleaner solution would be to 

1. Add a `scala.suffix` to the Scala profiles. This should be set as the 
empty string for 2.10 and as `_${scala.binary.version}` for 2.11.

2. Append the suffix to the required `artifactId` elements. For example, 
instead of 

```
${flink.clients.artifactId}
``` 

you would write 

```
flink-clients${scala.suffix}
```

IMHO this makes everything a bit easier to read and then a straight-forward 
to migrate to the "best practice" way of having a suffix for all profiles in 
the future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-07-03 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/885#discussion_r33857654
  
--- Diff: docs/apis/programming_guide.md ---
@@ -136,7 +136,7 @@ mvn archetype:generate /
 
 
 
-The archetypes are working for stable releases and preview versions 
(`-SNAPSHOT`)
+The archetypes are working for stable releases and preview versions. 
(`-SNAPSHOT`)
--- End diff --

The dot should be after (`-SNAPSHOT`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2311] Fixes 'flink-*' dependency scope ...

2015-07-02 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/880#issuecomment-118083599
  
OK then we can close the issue as wontfix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2311] Fixes 'flink-*' dependency scope ...

2015-07-02 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/880#issuecomment-118066010
  
OK thanks for the remark. 

Although somewhat verbose, this solves my concrete issue. I wonder if the 
list can be exclusive, e.g.

```xml


  org.apache.flink
  flink-tweet-inputformat
  ${project.version}

```

However, it still leaves the question what policy to use for flink-* based 
dependencies. 

As an example, if I want to include `flink-streaming-contrib` in a module 
that is build as a "thin" jar, transitive `flink-*` dependencies that will be 
packaged in the `flink-dist` fat jar do not need to be included.

I therefore suggest to at least make this consistent [with the quickstart 
poms](https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml)
 and [optionally set the scope of already included dependencies to `provided` 
if the `build-jar` profile is 
activated](https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml#L317).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2311] Fixes 'flink-*' dependency scope ...

2015-07-02 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/880#issuecomment-118063336
  
The PR is still open and can be adapted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2311] Fixes 'flink-*' dependency scope ...

2015-07-02 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/880#issuecomment-118036089
  
@StephanEwen can you tell me what package doesn't work so I can test it out?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2311] Fixes 'flink-*' dependency scope ...

2015-07-02 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/880#issuecomment-118035911
  
@mjsax exactly. Let's at the moment I land in the following situtaion. 

1. I want to create a job which includes the [statistics collection 
facility proposed by 
@tammymendt](https://issues.apache.org/jira/browse/FLINK-1297). 
1. In order to do that, my project needs to set `flink-contrib` with scope 
`compile` and use it in the final artifact (even if I am building a *thin* jar 
which is going to be submitted to a standalone flink cluster).
1. The jar that turns out to be ~ 64 MB. Out of that, half of the bits are 
taken by code is *guaranteed* to be in the classpath at rintime by virtue of 
being included in the `$FLINK_DIR/lib/flink-dist-$VERSION.jar`.
1. Setting all transitive `flink-*` dependencies  via `flink-contrib` as 
`provided` reduces the size to ~ 32 MB by default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-2311] Fixes 'flink-*' dependency scope ...

2015-07-02 Thread aalexandrov
GitHub user aalexandrov opened a pull request:

https://github.com/apache/flink/pull/880

[FLINK-2311] Fixes 'flink-*' dependency scope in flink-contrib.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aalexandrov/flink FLINK-2311

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/880.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #880


commit f6e5dce4dfdd5f0cf1c847171fd48df010232790
Author: Alexander Alexandrov 
Date:   2015-07-02T07:25:03Z

[FLINK-2311] Fixes 'flink-*' dependency scope in flink-contrib.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [Flink-1999] basic TfidfTransformer

2015-06-17 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/730#discussion_r32620391
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/TfIdfTransformer.scala
 ---
@@ -0,0 +1,128 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature
+
+import org.apache.flink.api.scala.{DataSet, _}
+import org.apache.flink.ml.common.{Parameter, ParameterMap, Transformer}
+import org.apache.flink.ml.math.SparseVector
+
+import scala.collection.mutable.LinkedHashSet
+import scala.math.log;
+
+/**
+ * This transformer calcuates the term-frequency times inverse document 
frequency for the give DataSet of documents.
+ * The DataSet will be treated as the corpus of documents. The single 
words will be filtered against the regex:
+ * 
+ * (?u)\b\w\w+\b
+ * 
+ * 
+ * The TF is the frequence of a word inside one document
+ * 
+ * The IDF for a word is calculated: log(total number of documents / 
documents that contain the word) + 1
--- End diff --

I think this was aligned with the behaviour observed in skikit-learn.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1735] Feature Hasher

2015-05-29 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/665#issuecomment-106914608
  
@FelixNeutatz I think this needs to be squashed before mergning as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1731] [ml] Implementation of Feature K-...

2015-05-29 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/700#issuecomment-106911412
  
Can anybody with more Apache insight answer to @peedeeX21 concerns? 
Otherwise I suggest to merge this and open a follow-up issue that extends the 
current implementation to KMeans++. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: basic TfidfTransformer

2015-05-29 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/730#issuecomment-106910157
  
@rbraeunlich please squash the commits and prepare this for merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1979] Lossfunctions

2015-05-29 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/656#issuecomment-106907665
  
This needs a PR before it is merged. @jojo19893, @mguldner please cherry 
pick your changes such that you have only one or two commits and force push in 
this branch as suggested by @tillrohrmann in the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: basic TfidfTransformer

2015-05-27 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/730#discussion_r31114073
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/TfIdfTransformer.scala
 ---
@@ -0,0 +1,107 @@
+package org.apache.flink.ml.feature
+
+import org.apache.flink.api.scala.{DataSet, _}
+import org.apache.flink.ml.common.{Parameter, ParameterMap, Transformer}
+import org.apache.flink.ml.math.SparseVector
+
+import scala.math.log
+import scala.util.hashing.MurmurHash3
+import scala.collection.mutable.LinkedHashSet;
+
+/**
+ * This transformer calcuates the term-frequency times inverse document 
frequency for the give DataSet of documents.
+ * The DataSet will be treated as the corpus of documents.
+ * 
+ * The TF is the frequence of a word inside one document
+ * 
+ * The IDF for a word is calculated: log(total number of documents / 
documents that contain the word) + 1
+ * 
+ * This transformer returns a SparseVector where the index is the hash of 
the word and the value the tf-idf.
+ * @author Ronny Bräunlich
+ * @author Vassil Dimov
+ * @author Filip Perisic
+ */
+class TfIdfTransformer extends Transformer[(Int, Seq[String]), (Int, 
SparseVector)] {
+
+  override def transform(input: DataSet[(Int /* docId */ , Seq[String] 
/*The document */ )], transformParameters: ParameterMap): DataSet[(Int, 
SparseVector)] = {
+
+val params = transformParameters.get(StopWordParameter)
+
+// Here we will store the words in he form (docId, word, count)
+// Count represent the occurrence of "word" in document "docId"
+val wordCounts = input
+  //count the words
+  .flatMap(t => {
+  //create tuples docId, word, 1
+  t._2.map(s => (t._1, s, 1))
+})
+  .filter(t => 
!transformParameters.apply(StopWordParameter).contains(t._2))
+  //group by document and word
+  .groupBy(0, 1)
+  // calculate the occurrence count of each word in specific document
+  .sum(2)
+
+val dictionary = wordCounts
+  .map(t => LinkedHashSet(t._2))
+  .reduce((set1, set2) => set1 ++ set2)
+  .map(set => set.zipWithIndex.toMap)
+  .flatMap(m => m.toList)
+
+val numberOfWords = wordCounts
+  .map(t => (t._2))
+  .distinct(t => t)
+  .map(t => 1)
+  .reduce(_ + _);
+
+val idf: DataSet[(String, Double)] = calculateIDF(wordCounts)
+val tf: DataSet[(Int, String, Int)] = wordCounts
+
+// docId, word, tfIdf
+val tfIdf = tf.join(idf).where(1).equalTo(0) {
+  (t1, t2) => (t1._1, t1._2, t1._3.toDouble * t2._2)
+}
+
+val res = tfIdf.crossWithTiny(numberOfWords)
+  // docId, word, tfIdf, numberOfWords
+  .map(t => (t._1._1, t._1._2, t._1._3, t._2))
+  //assign every word its position
+  .joinWithHuge(dictionary).where(1).equalTo(0)
--- End diff --

Use `.joinWithTiny` or `.join` or a broadcast variable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: basic TfidfTransformer

2015-05-27 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/730#discussion_r31113926
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/TfIdfTransformer.scala
 ---
@@ -0,0 +1,107 @@
+package org.apache.flink.ml.feature
+
+import org.apache.flink.api.scala.{DataSet, _}
+import org.apache.flink.ml.common.{Parameter, ParameterMap, Transformer}
+import org.apache.flink.ml.math.SparseVector
+
+import scala.math.log
+import scala.util.hashing.MurmurHash3
+import scala.collection.mutable.LinkedHashSet;
+
+/**
+ * This transformer calcuates the term-frequency times inverse document 
frequency for the give DataSet of documents.
+ * The DataSet will be treated as the corpus of documents.
+ * 
+ * The TF is the frequence of a word inside one document
+ * 
+ * The IDF for a word is calculated: log(total number of documents / 
documents that contain the word) + 1
+ * 
+ * This transformer returns a SparseVector where the index is the hash of 
the word and the value the tf-idf.
+ * @author Ronny Bräunlich
+ * @author Vassil Dimov
+ * @author Filip Perisic
+ */
+class TfIdfTransformer extends Transformer[(Int, Seq[String]), (Int, 
SparseVector)] {
+
+  override def transform(input: DataSet[(Int /* docId */ , Seq[String] 
/*The document */ )], transformParameters: ParameterMap): DataSet[(Int, 
SparseVector)] = {
+
+val params = transformParameters.get(StopWordParameter)
+
+// Here we will store the words in he form (docId, word, count)
+// Count represent the occurrence of "word" in document "docId"
+val wordCounts = input
+  //count the words
+  .flatMap(t => {
+  //create tuples docId, word, 1
+  t._2.map(s => (t._1, s, 1))
+})
+  .filter(t => 
!transformParameters.apply(StopWordParameter).contains(t._2))
+  //group by document and word
+  .groupBy(0, 1)
+  // calculate the occurrence count of each word in specific document
+  .sum(2)
+
+val dictionary = wordCounts
+  .map(t => LinkedHashSet(t._2))
+  .reduce((set1, set2) => set1 ++ set2)
+  .map(set => set.zipWithIndex.toMap)
--- End diff --

Make sure that you have exactly one `map ; flatMap` chain.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: basic TfidfTransformer

2015-05-27 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/730#discussion_r31113442
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/TfIdfTransformer.scala
 ---
@@ -0,0 +1,107 @@
+package org.apache.flink.ml.feature
+
+import org.apache.flink.api.scala.{DataSet, _}
+import org.apache.flink.ml.common.{Parameter, ParameterMap, Transformer}
+import org.apache.flink.ml.math.SparseVector
+
+import scala.math.log
+import scala.util.hashing.MurmurHash3
+import scala.collection.mutable.LinkedHashSet;
+
+/**
+ * This transformer calcuates the term-frequency times inverse document 
frequency for the give DataSet of documents.
+ * The DataSet will be treated as the corpus of documents.
+ * 
+ * The TF is the frequence of a word inside one document
+ * 
+ * The IDF for a word is calculated: log(total number of documents / 
documents that contain the word) + 1
+ * 
+ * This transformer returns a SparseVector where the index is the hash of 
the word and the value the tf-idf.
+ * @author Ronny Bräunlich
+ * @author Vassil Dimov
+ * @author Filip Perisic
+ */
+class TfIdfTransformer extends Transformer[(Int, Seq[String]), (Int, 
SparseVector)] {
+
+  override def transform(input: DataSet[(Int /* docId */ , Seq[String] 
/*The document */ )], transformParameters: ParameterMap): DataSet[(Int, 
SparseVector)] = {
+
+val params = transformParameters.get(StopWordParameter)
+
+// Here we will store the words in he form (docId, word, count)
+// Count represent the occurrence of "word" in document "docId"
+val wordCounts = input
+  //count the words
+  .flatMap(t => {
+  //create tuples docId, word, 1
+  t._2.map(s => (t._1, s, 1))
+})
+  .filter(t => 
!transformParameters.apply(StopWordParameter).contains(t._2))
+  //group by document and word
+  .groupBy(0, 1)
+  // calculate the occurrence count of each word in specific document
+  .sum(2)
+
+val dictionary = wordCounts
+  .map(t => LinkedHashSet(t._2))
+  .reduce((set1, set2) => set1 ++ set2)
--- End diff --

better use `groupReduce` which will give you the materialized set right 
away.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1735] Feature Hasher

2015-05-10 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/665#issuecomment-100685242
  
Nice job! :+1: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1735] Feature Hasher

2015-05-10 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/665#discussion_r30005092
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/extraction/FeatureHasher.scala
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature.extraction
+
+import java.nio.charset.Charset
+
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common.{Parameter, ParameterMap, Transformer}
+import org.apache.flink.ml.feature.extraction.FeatureHasher.{NonNegative, 
NumFeatures}
+import org.apache.flink.ml.math.{Vector, SparseVector}
+
+import scala.util.hashing.MurmurHash3
+
+
+/** This transformer turns sequences of symbolic feature names (strings) 
into
+  * flink.ml.math.SparseVectors, using a hash function to compute the 
matrix column corresponding
+  * to a name. Aka the hashing trick.
+  * The hash function employed is the signed 32-bit version of Murmurhash3.
+  *
+  * By default for [[FeatureHasher]] transformer numFeatures=2#94;20 and 
nonNegative=false.
+  *
+  * This transformer takes a [[Seq]] of strings and maps it to a
+  * feature [[Vector]].
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.common.Learner]] implementations which expect an 
input of
+  * [[Vector]].
+  *
+  * @example
+  *  {{{
+  *val trainingDS: DataSet[Seq[String]] = 
env.fromCollection(data)
+  *val transformer = 
FeatureHasher().setNumFeatures(65536).setNonNegative(false)
+  *
+  *transformer.transform(trainingDS)
+  *  }}}
+  *
+  * =Parameters=
+  *
+  * - [[FeatureHasher.NumFeatures]]: The number of features (entries) in 
the output vector;
+  * by default equal to 2^20
+  * - [[FeatureHasher.NonNegative]]: Whether output vector should contain 
non-negative values only.
+  * When True, output values can be interpreted as frequencies. When 
False, output values will have
+  * expected value zero; by default equal to false
+  */
+class FeatureHasher extends Transformer[Seq[String], Vector] with 
Serializable {
+
+  // The seed used to initialize the hasher
+  val Seed = 0
+
+  /** Sets the number of features (entries) in the output vector
+*
+* @param numFeatures the user-specified numFeatures value. In case the 
user gives a value less
+*than 1, numFeatures is set to its default value: 
2^20
+* @return the FeatureHasher instance with its numFeatures value set to 
the user-specified value
+*/
+  def setNumFeatures(numFeatures: Int): FeatureHasher = {
+// number of features must be greater zero
+if(numFeatures < 1) {
+  return this
--- End diff --

This might cause a small debugging hell. Throw a `RuntimeException` or at 
least log a `WARN` message here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1735] Feature Hasher

2015-05-10 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/665#discussion_r30005080
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/extraction/FeatureHasher.scala
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature.extraction
+
+import java.nio.charset.Charset
+
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common.{Parameter, ParameterMap, Transformer}
+import org.apache.flink.ml.feature.extraction.FeatureHasher.{NonNegative, 
NumFeatures}
+import org.apache.flink.ml.math.{Vector, SparseVector}
+
+import scala.util.hashing.MurmurHash3
+
+
+/** This transformer turns sequences of symbolic feature names (strings) 
into
+  * flink.ml.math.SparseVectors, using a hash function to compute the 
matrix column corresponding
+  * to a name. Aka the hashing trick.
+  * The hash function employed is the signed 32-bit version of Murmurhash3.
+  *
+  * By default for [[FeatureHasher]] transformer numFeatures=2#94;20 and 
nonNegative=false.
+  *
+  * This transformer takes a [[Seq]] of strings and maps it to a
+  * feature [[Vector]].
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.common.Learner]] implementations which expect an 
input of
+  * [[Vector]].
+  *
+  * @example
+  *  {{{
+  *val trainingDS: DataSet[Seq[String]] = 
env.fromCollection(data)
+  *val transformer = 
FeatureHasher().setNumFeatures(65536).setNonNegative(false)
+  *
+  *transformer.transform(trainingDS)
+  *  }}}
+  *
+  * =Parameters=
+  *
+  * - [[FeatureHasher.NumFeatures]]: The number of features (entries) in 
the output vector;
+  * by default equal to 2^20
+  * - [[FeatureHasher.NonNegative]]: Whether output vector should contain 
non-negative values only.
+  * When True, output values can be interpreted as frequencies. When 
False, output values will have
+  * expected value zero; by default equal to false
+  */
+class FeatureHasher extends Transformer[Seq[String], Vector] with 
Serializable {
+
+  // The seed used to initialize the hasher
+  val Seed = 0
+
+  /** Sets the number of features (entries) in the output vector
+*
+* @param numFeatures the user-specified numFeatures value. In case the 
user gives a value less
+*than 1, numFeatures is set to its default value: 
2^20
+* @return the FeatureHasher instance with its numFeatures value set to 
the user-specified value
+*/
+  def setNumFeatures(numFeatures: Int): FeatureHasher = {
+// number of features must be greater zero
+if(numFeatures < 1) {
+  return this
+}
+parameters.add(NumFeatures, numFeatures)
+this
+  }
+
+  /** Sets whether output vector should contain non-negative values only
+*
+* @param nonNegative the user-specified nonNegative value.
+* @return the FeatureHasher instance with its nonNegative value set to 
the user-specified value
+*/
+  def setNonNegative(nonNegative: Boolean): FeatureHasher = {
+parameters.add(NonNegative, nonNegative)
+this
+  }
+
+  override def transform(input: DataSet[Seq[String]], parameters: 
ParameterMap):
+  DataSet[Vector] = {
+val resultingParameters = this.parameters ++ parameters
+
+val nonNegative = resultingParameters(NonNegative)
+val numFeatures = resultingParameters(NumFeatures)
+
+// each item of the sequence is hashed and transformed into a tuple 
(index, value)
+input.map {
+  inputSeq => {
+val entries = inputSeq.map {
+  s => {
+// unicode strings are converted to utf-8
+// bytesHash is faster than arrayHash, becau

[GitHub] flink pull request: [FLINK-1735] Feature Hasher

2015-05-10 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/665#discussion_r30005062
  
--- Diff: 
flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/feature/extraction/FeatureHasher.scala
 ---
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature.extraction
+
+import java.nio.charset.Charset
+
+import org.apache.flink.api.scala._
+import org.apache.flink.ml.common.{Parameter, ParameterMap, Transformer}
+import org.apache.flink.ml.feature.extraction.FeatureHasher.{NonNegative, 
NumFeatures}
+import org.apache.flink.ml.math.{Vector, SparseVector}
+
+import scala.util.hashing.MurmurHash3
+
+
+/** This transformer turns sequences of symbolic feature names (strings) 
into
+  * flink.ml.math.SparseVectors, using a hash function to compute the 
matrix column corresponding
+  * to a name. Aka the hashing trick.
+  * The hash function employed is the signed 32-bit version of Murmurhash3.
+  *
+  * By default for [[FeatureHasher]] transformer numFeatures=2#94;20 and 
nonNegative=false.
+  *
+  * This transformer takes a [[Seq]] of strings and maps it to a
+  * feature [[Vector]].
+  *
+  * This transformer can be prepended to all [[Transformer]] and
+  * [[org.apache.flink.ml.common.Learner]] implementations which expect an 
input of
+  * [[Vector]].
+  *
+  * @example
+  *  {{{
+  *val trainingDS: DataSet[Seq[String]] = 
env.fromCollection(data)
+  *val transformer = 
FeatureHasher().setNumFeatures(65536).setNonNegative(false)
+  *
+  *transformer.transform(trainingDS)
+  *  }}}
+  *
+  * =Parameters=
+  *
+  * - [[FeatureHasher.NumFeatures]]: The number of features (entries) in 
the output vector;
+  * by default equal to 2^20
+  * - [[FeatureHasher.NonNegative]]: Whether output vector should contain 
non-negative values only.
+  * When True, output values can be interpreted as frequencies. When 
False, output values will have
+  * expected value zero; by default equal to false
+  */
+class FeatureHasher extends Transformer[Seq[String], Vector] with 
Serializable {
+
+  // The seed used to initialize the hasher
+  val Seed = 0
+
+  /** Sets the number of features (entries) in the output vector
+*
+* @param numFeatures the user-specified numFeatures value. In case the 
user gives a value less
+*than 1, numFeatures is set to its default value: 
2^20
+* @return the FeatureHasher instance with its numFeatures value set to 
the user-specified value
+*/
+  def setNumFeatures(numFeatures: Int): FeatureHasher = {
+// number of features must be greater zero
+if(numFeatures < 1) {
+  return this
+}
+parameters.add(NumFeatures, numFeatures)
+this
+  }
+
+  /** Sets whether output vector should contain non-negative values only
+*
+* @param nonNegative the user-specified nonNegative value.
+* @return the FeatureHasher instance with its nonNegative value set to 
the user-specified value
+*/
+  def setNonNegative(nonNegative: Boolean): FeatureHasher = {
+parameters.add(NonNegative, nonNegative)
+this
+  }
+
+  override def transform(input: DataSet[Seq[String]], parameters: 
ParameterMap):
+  DataSet[Vector] = {
+val resultingParameters = this.parameters ++ parameters
+
+val nonNegative = resultingParameters(NonNegative)
+val numFeatures = resultingParameters(NumFeatures)
+
+// each item of the sequence is hashed and transformed into a tuple 
(index, value)
+input.map {
+  inputSeq => {
+val entries = inputSeq.map {
+  s => {
+// unicode strings are converted to utf-8
+// bytesHash is faster than arrayHash, becau

[GitHub] flink pull request: [FLINK-1735] Feature Hasher

2015-05-10 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/665#discussion_r30004988
  
--- Diff: 
flink-staging/flink-ml/src/test/scala/org/apache/flink/ml/feature/extraction/FeatureHasherSuite.scala
 ---
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.ml.feature.extraction
+
+import org.apache.flink.api.scala.{ExecutionEnvironment, _}
+import org.apache.flink.ml.math.SparseVector
+import org.apache.flink.test.util.FlinkTestBase
+import org.scalatest.{FlatSpec, Matchers}
+
+class FeatureHasherSuite
+  extends FlatSpec
+  with Matchers
+  with FlinkTestBase {
+
+  behavior of "Flink's Feature Hasher"
+
+  import FeatureHasherData._
+
+  it should "transform a sequence of strings into a sparse feature vector 
of given size" in {
+val env = ExecutionEnvironment.getExecutionEnvironment
+
+env.setParallelism(2)
+
+for (numFeatures <- numFeaturesTest) {
+  val inputDS = env.fromCollection(input)
+
+  val transformer = FeatureHasher()
+.setNumFeatures(numFeatures)
+
+  val transformedDS = transformer.transform(inputDS)
+  val results = transformedDS.collect()
+
+  for ((result, expectedResult) <- results zip 
expectedResults(numFeatures)) {
+result.equalsVector(expectedResult) should be(true)
+  }
+}
+  }
+
+  it should "transform a sequence of strings into a sparse feature vector 
of given size," +
+"with non negative entries" in {
+val env = ExecutionEnvironment.getExecutionEnvironment
+
+env.setParallelism(2)
+
+for (numFeatures <- numFeaturesTest) {
+  val inputDS = env.fromCollection(input)
+
+  val transformer = FeatureHasher()
+.setNumFeatures(numFeatures).setNonNegative(true)
+
+  val transformedDS = transformer.transform(inputDS)
+  val results = transformedDS.collect()
+
+  for ((result, expectedResult) <- results zip 
expectedResultsNonNegative(numFeatures)) {
+result.equalsVector(expectedResult) should be(true)
+  }
+}
+  }
+
+  it should "transform a sequence of strings into a sparse feature vector 
of default size," +
+" when parameter is less than 1" in {
+val env = ExecutionEnvironment.getExecutionEnvironment
+
+env.setParallelism(2)
+
+val inputDS = env.fromCollection(input)
+
+val numFeatures = 0
+
+val transformer = FeatureHasher()
+  .setNumFeatures(numFeatures).setNonNegative(false)
+
+val transformedDS = transformer.transform(inputDS)
+val results = transformedDS.collect()
+
+for (result <- results) {
+  result.size should equal(Math.pow(2, 20).toInt)
+}
+  }
+}
+
+object FeatureHasherData {
+
+  val input = Seq(
+"Two households both alike in dignity".split(" ").toSeq,
+"In fair Verona where we lay our scene".split(" ").toSeq,
+"From ancient grudge break to new mutiny".split(" ").toSeq,
+"Where civil blood makes civil hands unclean".split(" ").toSeq,
+"From forth the fatal loins of these two foes".split(" ").toSeq
+  )
+
+  /* 2^30 features can't be tested right now because the implementation of 
Vector.equalsVector
+  performs an index wise comparison on the two vectors, which takes 
approx. forever */
+  val numFeaturesTest = Seq(Math.pow(2, 4).toInt, Math.pow(2, 5).toInt, 
1234,
+Math.pow(2, 16).toInt, Math.pow(2, 20).toInt) //, Math.pow(2, 
30).toInt)
+
+  val expectedResults = List(
+16 -> List(
+  SparseVector.fromCOO(16, Map((0, 1.0), (1, 1.0), (2, -1.0), (14, 
-1.0))),
  

[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...

2015-04-23 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-95559261
  
@aljoscha I'm sorry, I cannot follow? Can you elaborate?

The idea is to add support for proper folders next to jars when opening an 
execution environment. Since folders cannot be handled the same way as jars 
during execution (i.e., serialize and ship them around by the blob manager), 
the assumption for folder paths is that they are accessible from all agents in 
the distributed runtime (JobManager + TaskManagers), e.g., via shared NFS 
folders.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1297] Added OperatorStatsAccumulator fo...

2015-04-21 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/605#discussion_r28818529
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/operatorstatistics/OperatorStatsAccumulatorsTest.java
 ---
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.contrib.operatorstatistics;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.accumulators.Accumulator;
+import org.apache.flink.api.common.functions.RichFlatMapFunction;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.io.DiscardingOutputFormat;
+import org.apache.flink.api.java.tuple.Tuple1;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.test.util.AbstractTestBase;
+import org.apache.flink.util.Collector;
+import org.junit.Assert;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.Map;
+import java.util.Random;
+
+public class OperatorStatsAccumulatorsTest extends AbstractTestBase {
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(OperatorStatsAccumulatorsTest.class);
+
+   private static final String ACCUMULATOR_NAME = "op-stats";
+
+   public OperatorStatsAccumulatorsTest(){
+   super(new Configuration());
+   }
+
+   @Test
+   public void testAccumulator() throws Exception {
+
+   String input = "";
+
+   Random rand = new Random();
+
+   for (int i = 1; i < 1000; i++) {
+   if(rand.nextDouble()<0.2){
+   input+=String.valueOf(rand.nextInt(5))+"\n";
+   }else{
+   input+=String.valueOf(rand.nextInt(100))+"\n";
+   }
+   }
+
+   String inputFile = createTempFile("datapoints.txt", input);
+
+   ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+
+   env.readTextFile(inputFile).
+   flatMap(new StringToInt()).
+   output(new 
DiscardingOutputFormat>());
+
+   JobExecutionResult result = env.execute();
+
+   OperatorStatistics globalStats = 
result.getAccumulatorResult(ACCUMULATOR_NAME);
+   LOG.debug("Global Stats");
+   LOG.debug(globalStats.toString());
+
+   OperatorStatistics merged = null;
+
+   Map accResults = 
result.getAllAccumulatorResults();
+   for (String accumulatorName:accResults.keySet()){
+   if (accumulatorName.contains(ACCUMULATOR_NAME+"-")){
+   OperatorStatistics localStats = 
(OperatorStatistics) accResults.get(accumulatorName);
+   if (merged == null){
+   merged = localStats.clone();
+   }else {
+   merged.merge(localStats);
+   }
+   LOG.debug("Local Stats: " + accumulatorName);
+   LOG.debug(localStats.toString());
+   }
+   }
+
+   Assert.assertEquals(globalStats.cardinality,999);
+   Assert.assertEquals(globalStats.estimateCountDistinct(),100);
+   Assert.assertTrue(globalStats.getHeavyHitters().size()>0 && 
globalStats.getHeavyHitters().size()<=5);
+   Assert.assertEquals(merged.getMin(),globalStats.getMin());
+   Assert.assertEquals(merged.getMax(),globalStats.getMax());
+   
Assert.assertEquals(merged.estimateCountDistinct(),globalStats.estima

[GitHub] flink pull request: [FLINK-1297] Added OperatorStatsAccumulator fo...

2015-04-21 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/605#discussion_r28799879
  
--- Diff: 
flink-contrib/src/test/java/org/apache/flink/contrib/operatorstatistics/OperatorStatsAccumulatorsTest.java
 ---
@@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.contrib.operatorstatistics;
+
+import org.apache.flink.api.common.JobExecutionResult;
+import org.apache.flink.api.common.accumulators.Accumulator;
+import org.apache.flink.api.common.functions.RichFlatMapFunction;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.io.DiscardingOutputFormat;
+import org.apache.flink.api.java.tuple.Tuple1;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.test.util.AbstractTestBase;
+import org.apache.flink.util.Collector;
+import org.junit.Assert;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.Serializable;
+import java.util.Map;
+import java.util.Random;
+
+public class OperatorStatsAccumulatorsTest extends AbstractTestBase {
+
+   private static final Logger LOG = 
LoggerFactory.getLogger(OperatorStatsAccumulatorsTest.class);
+
+   private static final String ACCUMULATOR_NAME = "op-stats";
+
+   public OperatorStatsAccumulatorsTest(){
+   super(new Configuration());
+   }
+
+   @Test
+   public void testAccumulator() throws Exception {
+
+   String input = "";
+
+   Random rand = new Random();
+
+   for (int i = 1; i < 1000; i++) {
+   if(rand.nextDouble()<0.2){
+   input+=String.valueOf(rand.nextInt(5))+"\n";
+   }else{
+   input+=String.valueOf(rand.nextInt(100))+"\n";
+   }
+   }
+
+   String inputFile = createTempFile("datapoints.txt", input);
+
+   ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+
+   env.readTextFile(inputFile).
+   flatMap(new StringToInt()).
+   output(new 
DiscardingOutputFormat>());
+
+   JobExecutionResult result = env.execute();
+
+   OperatorStatistics globalStats = 
result.getAccumulatorResult(ACCUMULATOR_NAME);
+   LOG.debug("Global Stats");
+   LOG.debug(globalStats.toString());
+
+   OperatorStatistics merged = null;
+
+   Map accResults = 
result.getAllAccumulatorResults();
+   for (String accumulatorName:accResults.keySet()){
+   if (accumulatorName.contains(ACCUMULATOR_NAME+"-")){
+   OperatorStatistics localStats = 
(OperatorStatistics) accResults.get(accumulatorName);
+   if (merged == null){
+   merged = localStats.clone();
+   }else {
+   merged.merge(localStats);
+   }
+   LOG.debug("Local Stats: " + accumulatorName);
+   LOG.debug(localStats.toString());
+   }
+   }
+
+   Assert.assertEquals(globalStats.cardinality,999);
+   Assert.assertEquals(globalStats.estimateCountDistinct(),100);
+   Assert.assertTrue(globalStats.getHeavyHitters().size()>0 && 
globalStats.getHeavyHitters().size()<=5);
+   Assert.assertEquals(merged.getMin(),globalStats.getMin());
+   Assert.assertEquals(merged.getMax(),globalStats.getMax());
+   
Assert.assertEquals(merged.estimateCountDistinct(),globalStats.estima

[GitHub] flink pull request: [FLINK-1297] Added OperatorStatsAccumulator fo...

2015-04-16 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/605#issuecomment-93719337
  
:+1: thanks for the great work! I'll review that (probably over the 
weekend) and will appreciate if some of the core Flink committers (@sewen, 
@rmetzger, @fhueske) can also make a pass over the code.

One more caveat from me: this implements only the runtime aspect of the 
statistics collecting logic.
A second PR which allows to configure the points where statistics should be 
tracked in a programmatic way as part of the DataBag API shoud follow.

@tammymendt and me were discussing as syntax along the lines of:

```scala
A = // some dataflow assembly code
A.withStatistics( "statsForX", keySelectorFn ) 

env.execute()

// grab the statistics after the execution is done
env.getAccumulator("statsForX")
```

Once this is in place we will play around and implement some ideas on 
incremental optimization.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...

2015-04-14 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-92841182
  
:clap:


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Fixup ReusingKeyGroupedIterator value iterator...

2015-04-01 Thread aalexandrov
GitHub user aalexandrov opened a pull request:

https://github.com/apache/flink/pull/559

Fixup ReusingKeyGroupedIterator value iterators in 
NonReusingSortMergeCoGroupIterator.

Correct me if I'm wrong, but I think that the 
NonReusingSortMergeCoGroupIterator should construct value iterators which are 
non-reusing as well, otherwise you have unexpected behavior like this:

```java

public static void main(String[] args) throws Exception{
ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
env.getConfig().disableObjectReuse();

in1.coGroup(in2).where("name").equalTo("name").with(new 
SampleCoGroup());

env.execute()
}

public static class SampleCoGroup implements 
CoGroupFunction>{
@Override
public void coGroup(Iterable iterable, Iterable 
iterable1, Collector> collector) throws 
Exception {
for(StudentInfo info : iterable1) {
infos.add(info); // faulty behavior
}
// ...
}
}

```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aalexandrov/flink fix_reusing_iterator

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/559.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #559


commit c90d15cbedb1e5b587ac527e65659a9b85710d9f
Author: Alexander Alexandrov 
Date:   2015-04-01T16:08:26Z

Fixup ReusingKeyGroupedIterator value iterators in 
NonReusingSortMergeCoGroupIterator.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-20 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-83989884
  
I changed it as @StephanEwen suggested. Can somebody with admin access 
please cancel all previous pending builds for this PR here:

https://travis-ci.org/apache/flink/pull_requests

We can save some energy and time :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-20 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-83985616
  
All but one of the profiles (including the two scala 2.11 profiles) build 
with the latest rebase:

https://travis-ci.org/stratosphere/flink/builds/55147681

The one that fails stalls on a `flink-streaming-core` test case which I 
think is not related to the Scala 2.11 migration (actually it is a build that 
uses 2.10). I suggest to adapt the `.travis.yml` as follows in order to keep 
the amount of builds <= 5:

* add '-Dscala2.11 to one of the hadoop1 profiles
* add '-Dscala2.11 to one of the hadoop2 profiles

and then merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-19 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-83614068
  
> If I understand correctly, this does not affect the default mode of 
building against Scala 2.10

Correct. It just adds a profile that (optionally) builds with 2.11 and two 
Travis profiles that use the option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-19 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-83613932
  
> Can we "retrofit" some of the other builds to use Scala 2.11 ?
I guess @rmetzger should answer this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-19 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-83521096
  
Rebased against the current master 
(798e59523f0599096a214c24045a8f011b53ddbc). 

Can we please proceed with merging? The POMs are changing on a daily basis, 
and I have to manually adapt them after each change. The current PR at least 
adds a profile and runs a travis build that will complain if the code or the 
POMs are not 2.11 compatible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-19 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-83491250
  
One more thing. If we agree to add suffixes and ship multi-build artifacts, 
and we are going to restructure the maven projects anyway, why not add suffixes 
to both sets of artifacts?

I understand the appeal of adding suffix only for 2.11 for backward 
compatibility reasons, but with the envisioned package restructure everybody 
will have to touch their poms when migrating client code anyway. 

Adding a suffix consistently for all versions makes dependency management 
easier on the client side. Users can then configure their dependency entries 
like this

```xml

  2.11 



  
  
org.apache.flink
flink-core_${scala.tools.version}
0.9
  
  
  
org.scalanlp
breeze_${scala.tools.version}
0.10
  

```

and switch from 2.10 to 2.11 by just changing the property value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-18 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-82930555
  
Can we merge this before handling 
[FLINK-1712](https://issues.apache.org/jira/browse/FLINK-1712)?

I can open a separate PR that adds the suffixes and automates the 
deployment of two sets of artifacts later if we agree on that. The current PR 
at least introduces scala-2.11 and makes sure that newly contributed Scala code 
is both 2.10 and 2.11 compatible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-17 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-82266786
  
If we go with the suffix, we basically have two options:

1. Add the suffix only to modules that use Scala
1. Add the suffix to all maven modules, regardless whether they use Scala 
or not

Downside of option (1) is that we might break split names incrementally if 
we add Scala in the future.
Downside of option (2) is the more LOC that need to be adapted in the POMs.

My two cents are for (2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-16 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-8159
  
H... now it passes. This is weird...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-13 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-79048001
  
The scala-2.11 build

```bash
mvn -Dscala-2.11 clean install
```

fails on travis when trying to build the quickstart (@rmetzger reported the 
same problem above). It seems that the `flink-quickstart-scala` package 
[downloads and 
uses](https://travis-ci.org/stratosphere/flink/jobs/54233598#L8055) the current 
nightlies of `flink-scala` from Sonatype instead of using the package local 
version. On my local machine, the same command executes without a problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-13 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78942601
  
It seems that I accidentally removed the `maven-shade-plugin` section from 
the new `pom.xml` while rebasing. Let's see whether the tests pass now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-12 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78502251
  
[Some of the tests 
failed](https://travis-ci.org/apache/flink/builds/53879571) but I'm not sure 
whether this is related to the changes or not. I get the following error in the 
three failed Travis jobs:

```
 [ERROR] Failed to execute goal on project XXX: Could not resolve 
dependencies for project org.apache.flink.archetypetest:testArtifact:jar:0.1: 
Could not find artifact 
org.apache.flink:flink-shaded-include-yarn:jar:0.9-SNAPSHOT in 
sonatype-snapshots (https://oss.sonatype.org/content/repositories/snapshots/)
```

Could it be that the new artifact has not yet made it to Sonatype?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-12 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78307840
  
Any remarks on the my reply to the general comments from @rmetzger (scala 
version, suffix)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-12 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78262466
  
> Why are you using Scala 2.11.4? The mentioned bug seems to be fixed in 
2.11.6.

2.11.6 will be out in April. It is between 2.11.5 (which is known to have 
issues) and 2.11.4.

> Why aren't we adding a _2.11 suffix to the Scala 2.11 Flink builds?

We can do this, and it certainly makes sense if you want to ship pre-builds 
of both versions. With the current setup if you want to use Flink with 2.11 you 
have to build and install the maven projects yourself (I'm blindly following 
the Spark model here, let me know if you prefer the other option).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-11 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78348901
  
I rebased on the current master (includes the changes from PR #454 merged 
today). I'll take a look at the errors thrown on building later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-11 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/477#issuecomment-78333617
  
> The reason why I asked regarding Scala 2.11.6 was because this version is 
shown on the scala-lang website next to the download button.

Snap, I guess Christmas came early this year :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-11 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/477#discussion_r26210582
  
--- Diff: pom.xml ---
@@ -599,6 +599,38 @@ under the License.
 


+   scala-2.10
+   
+   
+   
+   !scala-2.11.profile
+   
+   
+   
+   
+   2.10.4
+   
2.10
+   
2.0.1
+   2.3.7
--- End diff --

OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-11 Thread aalexandrov
Github user aalexandrov commented on a diff in the pull request:

https://github.com/apache/flink/pull/477#discussion_r26210554
  
--- Diff: flink-scala/pom.xml ---
@@ -236,4 +230,23 @@ under the License.


 
+   
+   
+   scala-2.10
+   
+   
+   
+   !scala-2.11.profile
--- End diff --

I have created two profiles `scala-2.10` and `scala-2.11` and configured 
the activation to be mutually exclusive based a dedicated environment variable 
(`scala-2.11.profile`, but could be changed to `scala-2.11`). If you want to 
build with with 2.10 or 2.11, you do:

```bash
mvn package # for 2.10
mvn package -Dscala-2.11.profile # for 2.11
```

The 2.10 profile is (implicitly) activated by default at the moment because 
the activation environment variable `scala-2.11.profile` is not set by default.

I can rewrite it as you suggested (explicit activation based on profile 
names), but then the syntax for building with 2.11 becomes somewhat more 
cumbersome:

```bash
mvn package # for 2.10
mvn package -P!scala-2.10,scala-2.11 # for 2.11
```

Bare in mind, the current setup does not prohibit you to forcefully 
activate / deactivate profiles from the IDE or based on their names. The second 
set of commands should still work (I will verify this in a minute).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Add support for building Flink with Scala 2.11

2015-03-10 Thread aalexandrov
GitHub user aalexandrov opened a pull request:

https://github.com/apache/flink/pull/477

Add support for building Flink with Scala 2.11

This adds support for building Flink with Scala 2.11 as [per mailing list 
discussion](http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Offer-Flink-with-Scala-2-11-td4159.html).

I fixed the 2.11 version to 2.11.4 because 2.11.5 contains a [somewhat 
nasty bug](https://issues.scala-lang.org/browse/SI-9089) that might cause 
issues in some situations.

I also added a travis profile that builds with the `scala-2.11` profile 
enabled and `oraclejdk7` -- you might wish to change that to `oraclejdk8` or 
something else. Once this is merged I can take a look at the deploy scripts and 
extend them so we ship and deploy pre-builds for both 2.10 and 2.11.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/stratosphere/flink scala_2.11

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/477.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #477


commit e3b021cb5ebbc3ef6250fd6c1777b4fa55f34952
Author: Alexander Alexandrov 
Date:   2015-02-27T21:53:32Z

Migrated Scala version to 2.11.4 in the project POMs.

commit f655761586df24d9263f211121437f933838fd69
Author: Alexander Alexandrov 
Date:   2015-02-27T21:53:54Z

Migrated Scala code to 2.11.4.

commit 896d3a4185962c30384f690021e193615fc66f98
Author: Alexander Alexandrov 
Date:   2015-03-10T23:35:18Z

Added a travis profile for a scala-2.11 build.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Remove 'incubator-' prefix from README.md.

2015-02-06 Thread aalexandrov
GitHub user aalexandrov opened a pull request:

https://github.com/apache/flink/pull/371

Remove 'incubator-' prefix from README.md.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aalexandrov/flink patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/371.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #371


commit f8da7b89b4fa74c621983c7d0273d840349cf095
Author: Alexander Alexandrov 
Date:   2015-02-06T14:23:02Z

Remove 'incubator-' prefix from README.md.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Allow KeySelectors to implement ResultTypeQuer...

2015-02-04 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/354#issuecomment-72857627
  
I think that this only happens for the primitive types. I think this is by 
design.

If you want to inspect generic parameters, Scala forces you to use the 
Scala reflection API. There is no way around that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Allow KeySelectors to implement ResultTypeQuer...

2015-02-03 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/354#issuecomment-72660810
  
I think that [this StackOverflow article explains my 
problem](http://stackoverflow.com/questions/11586944/how-to-obtain-the-raw-datatype-of-a-parameter-of-a-field-that-is-specialized-in).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Allow KeySelectors to implement ResultTypeQuer...

2015-02-03 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/354#issuecomment-72660318
  
I would advocate to adding this one as well as a fallback option.

I have a situation where I want to use KeySelector that might return Java 
TupleXX instances parameterized with Scala types, e.g.:

```
class SelectFoo extends KeySelector[Tuple3[Int, Int, String], Tuple3[Int, 
String]] {
override def getKey(v: Tuple3[Int, Int, Int]) = new Tuple2(v.f0, v.f2)
}
``

Even though in this cases  the generic parameters are available, the Scala 
types cannot be inferred because the actual type field type parameters are 
erased by Scala and  are seen only as java.lang.Object from the Java reflection 
API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Added ResultTypeQueryable interface to TypeSer...

2015-01-30 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/349#issuecomment-72284550
  
@hsaputra: I just did: 
[FLINK-1464](https://issues.apache.org/jira/browse/FLINK-1464).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: Added ResultTypeQueryable interface to TypeSer...

2015-01-29 Thread aalexandrov
GitHub user aalexandrov opened a pull request:

https://github.com/apache/flink/pull/349

Added ResultTypeQueryable interface to TypeSerializerInputFormat.

It is currently impossible to use the `TypeSerializerInputFormat` with 
generic Tuple types.

For example, [this example 
gist](https://gist.github.com/aalexandrov/90bf21f66bf604676f37) fails with a

```
Exception in thread "main" 
org.apache.flink.api.common.InvalidProgramException: The type returned by the 
input format could not be automatically determined. Please specify the 
TypeInformation of the produced type explicitly.
at 
org.apache.flink.api.java.ExecutionEnvironment.readFile(ExecutionEnvironment.java:341)
at SerializedFormatExample$.main(SerializedFormatExample.scala:48)
at SerializedFormatExample.main(SerializedFormatExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
```
exaception. 

To fix the issue, I changed the constructor to take a `TypeInformation` 
instad of a `TypeSerializer` argument. If this is indeed a bug, I think that 
this is a good solution. 

Unfortunately the fix breaks the API. Feel free to change it if you find a 
more elegant solution compatible with the 0.8 branch.   

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aalexandrov/flink 
typeserializerinputformat_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/349.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #349


commit 2c1816640574c182f3b9f556ec5bab5df6f56f2c
Author: Alexander Alexandrov 
Date:   2015-01-29T14:41:01Z

Added ResultTypeQueryable interface implementation to 
TypeSerializerInputFormat.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink commit comment: 9ee74f3cf7b04740d6b00edf45d9413ff40322cf

2015-01-15 Thread aalexandrov
Github user aalexandrov commented on commit 
9ee74f3cf7b04740d6b00edf45d9413ff40322cf:


https://github.com/apache/flink/commit/9ee74f3cf7b04740d6b00edf45d9413ff40322cf#commitcomment-9297124
  
In .gitignore:
In .gitignore on line 1:
Is removing all these lines a standard procedure for the release branches?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-986] [FLINK-25] [Distributed runtime] A...

2015-01-12 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/254#issuecomment-69539129
  
YES!!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-986] [FLINK-25] [Distributed runtime] A...

2015-01-09 Thread aalexandrov
Github user aalexandrov commented on the pull request:

https://github.com/apache/flink/pull/254#issuecomment-69343812
  
I am excited!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---