Re: Thoughts About Object Reuse and Collection Execution

2015-02-27 Thread Ted Dunning
This is going to have profound performance implications if this is the only path for iteration. On Fri, Feb 27, 2015 at 10:58 PM, Stephan Ewen wrote: > I vote to have the key extractor return a new value each time. That means > that objects are not reused everywhere where it is possible, but s

Re: Could not build up connection to JobManager

2015-02-27 Thread Dulaj Viduranga
Here is the taskmanager log when I tried taskmanager.sh start flink-Vidura-taskmanager-localhost.log > On Feb 27, 2015, at 4:12 PM, Till Rohrmann wrote: > > It depends on how you started Flink.

Re: Could not build up connection to JobManager

2015-02-27 Thread Dulaj Viduranga
Hi, I’m thinking I’m doing something wrong. After setting jobManager address to 127.0.0.1, I can run kmeans example (java -cp ../examples/flink-java-examples-0.9-SNAPSHOT-KMeans.jar org.apache.flink.examples.java.clustering.util.KMeansDataGenerator 500 10 0.08) But I can’t run word count example

Re: Thoughts About Object Reuse and Collection Execution

2015-02-27 Thread Stephan Ewen
I vote to have the key extractor return a new value each time. That means that objects are not reused everywhere where it is possible, but still in most places, which still helps. What still puzzles me: I thought that the collection execution stores copies of the returned records by default (reuse

Re: Tweets Custom Input Format

2015-02-27 Thread Mustafa Elbehery
@robert, I have created the PR https://github.com/apache/flink/pull/442, On Fri, Feb 27, 2015 at 11:58 AM, Mustafa Elbehery < elbeherymust...@gmail.com> wrote: > @Robert, > > Thanks I was asking about the procedure. I have opened a Jira ticket for > Flink-Contrib and I will create a PR with th

Queries regarding RDFs with Flink

2015-02-27 Thread santosh_rajaguru
Hello, how can flink be useful for processing the data to RDFs and build the ontology? Regards, Santosh -- View this message in context: http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html Sent from the Apache Flink (Incub

[jira] [Created] (FLINK-1616) Action "list -r" gives IOException when there are running jobs

2015-02-27 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-1616: Summary: Action "list -r" gives IOException when there are running jobs Key: FLINK-1616 URL: https://issues.apache.org/jira/browse/FLINK-1616 Project: Flink

Re: [DISCUSS] Dedicated streaming mode and start scripts

2015-02-27 Thread Márton Balassi
Today we had a discussion with Robert on this issue. I would like to eventually have the streaming grouped and the windowing buffers/state maybe along with the crucial state of the user in the managed memory. If we had this separating the two modes could became less important as streaming would als

Thoughts About Object Reuse and Collection Execution

2015-02-27 Thread Aljoscha Krettek
Hello Nation of Flink, while figuring out this bug: https://issues.apache.org/jira/browse/FLINK-1569 I came upon some difficulties. The problem is that the KeyExtractorMappers always return the same tuple. This is problematic, since Collection Execution does simply store the returned values in a li

Re: [DISCUSS] URI NullPointerException in TestBaseUtils

2015-02-27 Thread Márton Balassi
+1 On Fri, Feb 27, 2015 at 11:32 AM, Szabó Péter wrote: > Yeah, I agree, it is at best a cosmetic issue. I just wanted to let you > know about it. > > Peter > > > 2015-02-27 11:10 GMT+01:00 Till Rohrmann : > > > Catching the NullPointerException and throwing an > IllegalArgumentException > > wit

Re: Tweets Custom Input Format

2015-02-27 Thread Mustafa Elbehery
@Robert, Thanks I was asking about the procedure. I have opened a Jira ticket for Flink-Contrib and I will create a PR with the naming convention on Wiki, https://issues.apache.org/jira/browse/FLINK-1615, On Fri, Feb 27, 2015 at 11:55 AM, Robert Metzger wrote: > I'm glad you've found the how

[jira] [Created] (FLINK-1615) Introduces a new InputFormat for Tweets

2015-02-27 Thread mustafa elbehery (JIRA)
mustafa elbehery created FLINK-1615: --- Summary: Introduces a new InputFormat for Tweets Key: FLINK-1615 URL: https://issues.apache.org/jira/browse/FLINK-1615 Project: Flink Issue Type: New F

Re: Tweets Custom Input Format

2015-02-27 Thread Robert Metzger
I'm glad you've found the how to contribute guide. I can not describe the process to open a pull request better than already written in the guide. Maybe this link is also helpful for you: https://help.github.com/articles/creating-a-pull-request/ Are you facing a particular error message? Maybe th

Re: Could not build up connection to JobManager

2015-02-27 Thread Till Rohrmann
It depends on how you started Flink. If you started a local cluster, then the TaskManager log is contained in the JobManager log we just don't see the respective log output in the snippet you posted. If you started a TaskManager independently, either by taskmanager.sh or by start-cluster.sh, then a

Re: [DISCUSS] URI NullPointerException in TestBaseUtils

2015-02-27 Thread Szabó Péter
Yeah, I agree, it is at best a cosmetic issue. I just wanted to let you know about it. Peter 2015-02-27 11:10 GMT+01:00 Till Rohrmann : > Catching the NullPointerException and throwing an IllegalArgumentException > with a meaningful message might clarify things. > > Considering that it only aff

[jira] [Created] (FLINK-1614) JM Webfrontend doesn't always show the correct state of Tasks

2015-02-27 Thread Robert Metzger (JIRA)
Robert Metzger created FLINK-1614: - Summary: JM Webfrontend doesn't always show the correct state of Tasks Key: FLINK-1614 URL: https://issues.apache.org/jira/browse/FLINK-1614 Project: Flink

Re: [DISCUSS] URI NullPointerException in TestBaseUtils

2015-02-27 Thread Till Rohrmann
Catching the NullPointerException and throwing an IllegalArgumentException with a meaningful message might clarify things. Considering that it only affects the TestBaseUtils, it should not be big deal to change it. On Fri, Feb 27, 2015 at 10:30 AM, Szabó Péter wrote: > The following code snippe

Re: Flink Streaming parallelism bug report

2015-02-27 Thread Szabó Péter
No problem. I will not commit the modification until it is clarified. Peter 2015-02-27 10:48 GMT+01:00 Gyula Fóra : > I can't look at it at the moment, I am on vacation and don't have my > laptop. > On Feb 27, 2015 9:41 AM, "Szabó Péter" wrote: > > > Okay, thanks! > > > > In my case, I tried to

Re: Tweets Custom Input Format

2015-02-27 Thread Mustafa Elbehery
Actually I am reading "How to contribute" now to push the code. Its working and tested locally and on the cluster, and i have used it for an ETL. The structure as follow :- Java Pojos for the tweet object, and the nested objects. Parser class using event-driven approach, and the SimpleTweetInput

Re: Flink Streaming parallelism bug report

2015-02-27 Thread Gyula Fóra
I can't look at it at the moment, I am on vacation and don't have my laptop. On Feb 27, 2015 9:41 AM, "Szabó Péter" wrote: > Okay, thanks! > > In my case, I tried to run an ITCase test and the environment parallelism > is happened to be -1, and an exception was thrown. The other ITCases ran > pro

Re: Tweets Custom Input Format

2015-02-27 Thread Robert Metzger
Hi, cool! Can you generalize the input format to read JSON into an arbitrary POJO? It would be great if you could contribute the InputFormat into the "flink-contrib" module. I've seen many users reading JSON data with Flink, so its good to have a standard solution for that. If you want you can ad

Re: Flink Streaming parallelism bug report

2015-02-27 Thread Szabó Péter
Okay, thanks! In my case, I tried to run an ITCase test and the environment parallelism is happened to be -1, and an exception was thrown. The other ITCases ran properly, so I figured, the problem is with the windowing. Can you check it out for me? (WindowedDataStream, line 348) Peter 2015-02-27

Re: Tweets Custom Input Format

2015-02-27 Thread Mustafa Elbehery
Hi, I am really sorry for being so late, it was a whole month of projects and examination, I was really busy. @Robert, it is IF for reading tweet into Pojo. I use an event-driven parser, I retrieve most of the tweet into Java Pojos, it was tested on 1TB dataset, for a Flink ETL job, and the perfo

[DISCUSS] URI NullPointerException in TestBaseUtils

2015-02-27 Thread Szabó Péter
The following code snippet in from TestBaseUtils: protected static File asFile(String path) { try { URI uri = new URI(path); if (uri.getScheme().equals("file")) { return new File(uri.getPath()); } else { throw new IllegalArgumentException("This path does not

Re: Drop support for CDH4 / Hadoop 2.0.0-alpha

2015-02-27 Thread Robert Metzger
@Henry: We would still shade Hadoop because of its Guava / ASM dependencies which interfere with our dependencies. The nice thing of my change is that all the other flink modules don't have to care about the details of our Hadoop dependencie. Its basically an abstract hadoop dependency, without gua

Re: Contributing to Flink

2015-02-27 Thread Max Michels
Hi Niraj, Pleased to here you want to start contributing to Flink :) In terms of security, there are some open issues. Like Robert metioned, it would be great if you could implement proper HDFS Kerberos authentication. Basically, the HDFS Delegation Token needs to be transferred to the workers so

Re: Flink Streaming parallelism bug report

2015-02-27 Thread Gyula Fóra
They should actually return different values in many cases. Datastream.env.getDegreeOfParallelism returns the environment parallelism (default) Datastream.getparallelism() returns the parallelism of the operator. There is a reason when one or the other is used. Please watch out when you try to m

Flink Streaming parallelism bug report

2015-02-27 Thread Szabó Péter
As I know, the time of creation of the execution environment has been slightly modified in the streaming API, which caused that dataStream.getParallelism() and dataStream.env.getDegreeOfParallelism() may return different values. Usage of the former is recommended. In theory, the latter is eliminate

Re: [DISCUSS] Iterative streaming example

2015-02-27 Thread Szabó Péter
Cool! At the moment I don't have any good use cases, but I will read some literature about it in the near future. The first priority for me is to make a good streaming iteration example, and Márton liked the machine-learning idea. That, and there is a group in SZTAKI that develops recommendation sy