from:"Sachin Goel"

Re: Choosing random element

2015-06-16 Thread Sachin Goel

Hi
If you're looking for an implementation, I wrote this in context of a
Random k-means initialization. Take a look at the {{weightedFit}} function.
https://github.com/sachingoel0101/flink/blob/clustering_initializations/flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/clustering/KMeansRandomInit.scala

Note that numCluster is the number of points you need in the sample.

Regards
Sachin Goel

On Tue, Jun 16, 2015 at 2:45 PM, Maximilian Alber <
alber.maximil...@gmail.com> wrote:

> Thanks!
> Cheers,
> Max
>
> On Tue, Jun 16, 2015 at 11:01 AM, Till Rohrmann 
> wrote:
>
>> This might help you [1].
>>
>> Cheers,
>> Till
>>
>> [1]
>> http://stackoverflow.com/questions/2514061/how-to-pick-random-small-data-samples-using-map-reduce
>>
>>
>> On Tue, Jun 16, 2015 at 10:16 AM Maximilian Alber <
>> alber.maximil...@gmail.com> wrote:
>>
>>> Hi Flinksters,
>>>
>>> again a similar problem. I would like to choose ONE random element out
>>> of a data set, without shuffling the whole set. Again I would like to have
>>> the element (mathematically) randomly chosen.
>>>
>>> Thanks!
>>> Cheers,
>>> Max
>>>
>>
>

Re: Run Time Exception

2015-07-19 Thread Sachin Goel

Hi
You do not need to call env.execute after doing a print call. Print itself
triggers the execution. The reason for the Exception is quite obvious.
After the print call, there is no sink for the program execution. So,
execution cannot proceed.
You can however explicitly define a sink and then call env.execute.

Cheers!
Sachin

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Sun, Jul 19, 2015 at 8:06 PM, Madabhattula Rajesh Kumar <
mrajaf...@gmail.com> wrote:

> Hi,
>
> I have written simple wordcount program in scala. When I execute the
> program, I'm getting below exception.
>
> Please let me know how to fix this issue. I'm using Flink 0.9.0 version
>
> *Below is the program :-*
>
> val env = ExecutionEnvironment.getExecutionEnvironment
> // get input data
> val text = env readTextFile("/Users/hadoop2/Data/word.txt")
> val counts = text flatMap(l=>l split(" ")) map(word=>(word,1))
> groupBy(0) sum(1)
> // emit result
> counts print
> env.execute("TEST")
>
> *Exception :-*
>
> Exception in thread "main" java.lang.RuntimeException: No new data sinks
> have been defined since the last execution. The last execution refers to
> the latest call to 'execute()', 'count()', 'collect()', or 'print()'.
> at
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:910)
> at
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:893)
> at
> org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:50)
> at
> org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:590)
> at WordCount$.main(WordCount.scala:17)
> at WordCount.main(WordCount.scala)
>
> Regards,
> Rajesh
>

Re: loop break operation

2015-07-20 Thread Sachin Goel

Hi
You can use iterateWithTermination to terminate before max iterations. The
feedback for iteration then would be (next solution, isConverged) where
isConverged is an empty data set if you wish to terminate.
However, this is something I have a pull request for:
https://github.com/apache/flink/pull/918. Take a look.

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Mon, Jul 20, 2015 at 2:55 PM, Pa Rö 
wrote:

> hello community,
>
> i have write a k-means app in flink, now i want change my terminate
> condition from max iteration to checking the changing of the cluster
> centers, but i don't know how i can break the flink loop. here my execution
> code of flink:
>
> public void run() {
> //load properties
> Properties pro = new Properties();
> FileSystem fs = null;
> try {
>
> pro.load(FlinkMain.class.getResourceAsStream("/config.properties"));
> fs = FileSystem.get(new
> URI(pro.getProperty("hdfs.namenode")),new
> org.apache.hadoop.conf.Configuration());
> } catch (Exception e) {
> e.printStackTrace();
> }
>
> int maxIteration =
> Integer.parseInt(pro.getProperty("maxiterations"));
> String outputPath =
> fs.getHomeDirectory()+pro.getProperty("flink.output");
> // set up execution environment
> ExecutionEnvironment env =
> ExecutionEnvironment.getExecutionEnvironment();
> // get input points
> DataSet points = getPointDataSet(env);
> DataSet centroids = null;
> try {
> centroids = getCentroidDataSet(env);
> } catch (Exception e1) {
> e1.printStackTrace();
> }
> // set number of bulk iterations for KMeans algorithm
> IterativeDataSet loop =
> centroids.iterate(maxIteration);
> DataSet newCentroids = points
> // compute closest centroid for each point
> .map(new
> SelectNearestCenter(this.getBenchmarkCounter())).withBroadcastSet(loop,
> "centroids")
> // count and sum point coordinates for each centroid
> .groupBy(0).reduceGroup(new CentroidAccumulator())
> // compute new centroids from point counts and coordinate sums
> .map(new CentroidAverager(this.getBenchmarkCounter()));
> // feed new centroids back into next iteration
> DataSet finalCentroids =
> loop.closeWith(newCentroids);
> DataSet> clusteredPoints = points
> // assign points to final clusters
> .map(new
> SelectNearestCenter(-1)).withBroadcastSet(finalCentroids, "centroids");
> // emit result
> clusteredPoints.writeAsCsv(outputPath+"/points", "\n", " ");
> finalCentroids.writeAsText(outputPath+"/centers");//print();
> // execute program
> try {
> env.execute("KMeans Flink");
> } catch (Exception e) {
> e.printStackTrace();
> }
> }
>
> is it possible to use a contruct like: if(centroids equals points){break
> the loop}???
>
> best regards,
> paul
>

Re: loop break operation

2015-07-20 Thread Sachin Goel

Gah. Sorry.
In the closeWith call, give a second argument which determines if the
iteration should be stopped.

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685
On Jul 20, 2015 3:21 PM, "Pa Rö"  wrote:

> i not found the "iterateWithTermination" function, only "iterate" and
> "iterateDelta". i use flink 0.9.0 with java.
>
> 2015-07-20 11:30 GMT+02:00 Sachin Goel :
>
>> Hi
>> You can use iterateWithTermination to terminate before max iterations.
>> The feedback for iteration then would be (next solution, isConverged) where
>> isConverged is an empty data set if you wish to terminate.
>> However, this is something I have a pull request for:
>> https://github.com/apache/flink/pull/918. Take a look.
>>
>> -- Sachin Goel
>> Computer Science, IIT Delhi
>> m. +91-9871457685
>>
>> On Mon, Jul 20, 2015 at 2:55 PM, Pa Rö 
>> wrote:
>>
>>> hello community,
>>>
>>> i have write a k-means app in flink, now i want change my terminate
>>> condition from max iteration to checking the changing of the cluster
>>> centers, but i don't know how i can break the flink loop. here my execution
>>> code of flink:
>>>
>>> public void run() {
>>> //load properties
>>> Properties pro = new Properties();
>>> FileSystem fs = null;
>>> try {
>>>
>>> pro.load(FlinkMain.class.getResourceAsStream("/config.properties"));
>>> fs = FileSystem.get(new
>>> URI(pro.getProperty("hdfs.namenode")),new
>>> org.apache.hadoop.conf.Configuration());
>>> } catch (Exception e) {
>>> e.printStackTrace();
>>> }
>>>
>>> int maxIteration =
>>> Integer.parseInt(pro.getProperty("maxiterations"));
>>> String outputPath =
>>> fs.getHomeDirectory()+pro.getProperty("flink.output");
>>> // set up execution environment
>>> ExecutionEnvironment env =
>>> ExecutionEnvironment.getExecutionEnvironment();
>>> // get input points
>>> DataSet points = getPointDataSet(env);
>>> DataSet centroids = null;
>>> try {
>>> centroids = getCentroidDataSet(env);
>>> } catch (Exception e1) {
>>> e1.printStackTrace();
>>> }
>>> // set number of bulk iterations for KMeans algorithm
>>> IterativeDataSet loop =
>>> centroids.iterate(maxIteration);
>>> DataSet newCentroids = points
>>> // compute closest centroid for each point
>>> .map(new
>>> SelectNearestCenter(this.getBenchmarkCounter())).withBroadcastSet(loop,
>>> "centroids")
>>> // count and sum point coordinates for each centroid
>>> .groupBy(0).reduceGroup(new CentroidAccumulator())
>>> // compute new centroids from point counts and coordinate
>>> sums
>>> .map(new CentroidAverager(this.getBenchmarkCounter()));
>>> // feed new centroids back into next iteration
>>> DataSet finalCentroids =
>>> loop.closeWith(newCentroids);
>>> DataSet> clusteredPoints =
>>> points
>>> // assign points to final clusters
>>> .map(new
>>> SelectNearestCenter(-1)).withBroadcastSet(finalCentroids, "centroids");
>>> // emit result
>>> clusteredPoints.writeAsCsv(outputPath+"/points", "\n", " ");
>>> finalCentroids.writeAsText(outputPath+"/centers");//print();
>>> // execute program
>>> try {
>>> env.execute("KMeans Flink");
>>> } catch (Exception e) {
>>> e.printStackTrace();
>>> }
>>> }
>>>
>>> is it possible to use a contruct like: if(centroids equals points){break
>>> the loop}???
>>>
>>> best regards,
>>> paul
>>>
>>
>>
>

Re: filter as termination condition

2015-07-22 Thread Sachin Goel

It appears that you're returning true when the previous and current
solution are the same. You should instead return false in that case,
because this is when the iteration should terminate.
Further, instead of joining, it would be a good idea to broadcast the new
solution to the old solution [or the other way around] and have some
tolerance value instead of an exact equality check.

Cheers!
Sachin

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685
On Jul 22, 2015 5:46 PM, "Stephan Ewen"  wrote:

> Termination happens if the "termination criterion" data set is empty.
>
> Maybe your filter is too aggressive and filters out everything, or the
> join is wrong and nothing joins...
>
> On Tue, Jul 21, 2015 at 5:05 PM, Pa Rö 
> wrote:
>
>> hello,
>>
>> i have define a filter for the termination condition by k-means.
>> if i run my app it always compute only one iteration.
>>
>> i think the problem is here:
>> DataSet finalCentroids = loop.closeWith(newCentroids,
>> newCentroids.join(loop).where("*").equalTo("*").filter(new MyFilter()));
>> or maybe the filter function:
>> public static final class MyFilter implements
>> FilterFunction> {
>>
>> private static final long serialVersionUID = 5868635346889117617L;
>>
>> public boolean filter(Tuple2> GeoTimeDataCenter> tuple) throws Exception {
>> if(tuple.f0.equals(tuple.f1)) {
>> return true;
>> }
>> else {
>> return false;
>> }
>> }
>> }
>>
>> best regards,
>> paul
>>
>> my full code here:
>>
>> public void run() {
>> //load properties
>> Properties pro = new Properties();
>> FileSystem fs = null;
>> try {
>>
>> pro.load(FlinkMain.class.getResourceAsStream("/config.properties"));
>> fs = FileSystem.get(new
>> URI(pro.getProperty("hdfs.namenode")),new
>> org.apache.hadoop.conf.Configuration());
>> } catch (Exception e) {
>> e.printStackTrace();
>> }
>>
>> int maxIteration =
>> Integer.parseInt(pro.getProperty("maxiterations"));
>> String outputPath =
>> fs.getHomeDirectory()+pro.getProperty("flink.output");
>> // set up execution environment
>> ExecutionEnvironment env =
>> ExecutionEnvironment.getExecutionEnvironment();
>> // get input points
>> DataSet points = getPointDataSet(env);
>> DataSet centroids = null;
>> try {
>> centroids = getCentroidDataSet(env);
>> } catch (Exception e1) {
>> e1.printStackTrace();
>> }
>> // set number of bulk iterations for KMeans algorithm
>> IterativeDataSet loop =
>> centroids.iterate(maxIteration);
>> DataSet newCentroids = points
>> // compute closest centroid for each point
>> .map(new
>> SelectNearestCenter(this.getBenchmarkCounter())).withBroadcastSet(loop,
>> "centroids")
>> // count and sum point coordinates for each centroid
>> .groupBy(0).reduceGroup(new CentroidAccumulator())
>> // compute new centroids from point counts and coordinate sums
>> .map(new CentroidAverager(this.getBenchmarkCounter()));
>> // feed new centroids back into next iteration with termination
>> condition
>> DataSet finalCentroids =
>> loop.closeWith(newCentroids,
>> newCentroids.join(loop).where("*").equalTo("*").filter(new MyFilter()));
>> DataSet> clusteredPoints =
>> points
>> // assign points to final clusters
>> .map(new
>> SelectNearestCenter(-1)).withBroadcastSet(finalCentroids, "centroids");
>> // emit result
>> clusteredPoints.writeAsCsv(outputPath+"/points", "\n", " ");
>> finalCentroids.writeAsText(outputPath+"/centers");//print();
>> // execute program
>> try {
>> env.execute("KMeans Flink");
>> } catch (Exception e) {
>> e.printStackTrace();
>> }
>> }
>>
>> public static final class MyFilter implements
>> FilterFunction> {
>>
>> private static final long serialVersionUID = 5868635346889117617L;
>>
>> public boolean filter(Tuple2> GeoTimeDataCenter> tuple) throws Exception {
>> if(tuple.f0.equals(tuple.f1)) {
>> return true;
>> }
>> else {
>> return false;
>> }
>> }
>> }
>>
>
>

Travis updates on Github

2015-09-02 Thread Sachin Goel

Hi all
Is there some issue with travis integration? The last three pull requests
do not have their build status on Github page. The builds are getting
triggered though.

Regards
Sachin
-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel

Hi all
While using LocalEnvironment, in case the program triggers execution
several times, the {{LocalFlinkMiniCluster}} is started as many times. This
can consume a lot of time in setting up and tearing down the cluster.
Further, this hinders with a new functionality I'm working on based on
persisted results.
One potential solution could be to follow the methodology in
`MultipleProgramsTestBase`. The user code then would have to reside in a
fixed function name, instead of the main method. Or is that too cumbersome?

Regards
Sachin
-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel

Yes. That will work too. However, then it isn't possible to shut down the
local cluster. [Is it necessary to do so or does it shut down automatically
when the program exists? I'm not entirely sure.]

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Wed, Sep 2, 2015 at 7:59 PM, Stephan Ewen  wrote:

> Have a look at some other tests, like the checkpointing tests. They start
> one cluster manually and keep it running. They connect against it using the
> remote environment ("localhost", miniCluster.getJobManagerRpcPort()).
>
> That works nicely...
>
> On Wed, Sep 2, 2015 at 4:23 PM, Sachin Goel 
> wrote:
>
>> Hi all
>> While using LocalEnvironment, in case the program triggers execution
>> several times, the {{LocalFlinkMiniCluster}} is started as many times. This
>> can consume a lot of time in setting up and tearing down the cluster.
>> Further, this hinders with a new functionality I'm working on based on
>> persisted results.
>> One potential solution could be to follow the methodology in
>> `MultipleProgramsTestBase`. The user code then would have to reside in a
>> fixed function name, instead of the main method. Or is that too cumbersome?
>>
>> Regards
>> Sachin
>> -- Sachin Goel
>> Computer Science, IIT Delhi
>> m. +91-9871457685
>>
>
>

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel

I was under the impression that the @AfterClass annotation can only be used
in test classes.
Even so, the idea is that a user program running in the IDE should not be
starting up the cluster several times [my primary concern is the addition
of the persist operator], and we certainly cannot ask the user to terminate
the cluster after execution, while in local mode.

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Wed, Sep 2, 2015 at 9:19 PM, Till Rohrmann  wrote:

> Why is it not possible to shut down the local cluster? Can’t you shut it
> down in the @AfterClass method?
> 
>
> On Wed, Sep 2, 2015 at 4:56 PM, Sachin Goel 
> wrote:
>
>> Yes. That will work too. However, then it isn't possible to shut down the
>> local cluster. [Is it necessary to do so or does it shut down automatically
>> when the program exists? I'm not entirely sure.]
>>
>> -- Sachin Goel
>> Computer Science, IIT Delhi
>> m. +91-9871457685
>>
>> On Wed, Sep 2, 2015 at 7:59 PM, Stephan Ewen  wrote:
>>
>>> Have a look at some other tests, like the checkpointing tests. They
>>> start one cluster manually and keep it running. They connect against it
>>> using the remote environment ("localhost",
>>> miniCluster.getJobManagerRpcPort()).
>>>
>>> That works nicely...
>>>
>>> On Wed, Sep 2, 2015 at 4:23 PM, Sachin Goel 
>>> wrote:
>>>
>>>> Hi all
>>>> While using LocalEnvironment, in case the program triggers execution
>>>> several times, the {{LocalFlinkMiniCluster}} is started as many times. This
>>>> can consume a lot of time in setting up and tearing down the cluster.
>>>> Further, this hinders with a new functionality I'm working on based on
>>>> persisted results.
>>>> One potential solution could be to follow the methodology in
>>>> `MultipleProgramsTestBase`. The user code then would have to reside in a
>>>> fixed function name, instead of the main method. Or is that too cumbersome?
>>>>
>>>> Regards
>>>> Sachin
>>>> -- Sachin Goel
>>>> Computer Science, IIT Delhi
>>>> m. +91-9871457685
>>>>
>>>
>>>
>>
>

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel

Okay. No problem.
Any suggestions for the correct context though? :')

I don't think something like a {{FlinkProgram}} class is a good idea [User
would need to override a {{program}} method and we will make sure the
cluster is setup only once and torn down properly only after the user code
finishes completely].

However, if this isn't so, the shutting down of cluster becomes impossible.
Can I assume however, that the actor system will be shut down automatically
when the main method exits? After all the JVM will terminate. If so, I can
make some changes in LocalExecutor to start up the cluster only once.

Regards
Sachin

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Wed, Sep 2, 2015 at 9:33 PM, Till Rohrmann  wrote:

> Oh sorry, then I got the wrong context. I somehow thought it was about
> test cases because I read `MultipleProgramTestBase` etc. Sorry my bad.
>
> On Wed, Sep 2, 2015 at 6:00 PM, Sachin Goel 
> wrote:
>
>> I was under the impression that the @AfterClass annotation can only be
>> used in test classes.
>> Even so, the idea is that a user program running in the IDE should not be
>> starting up the cluster several times [my primary concern is the addition
>> of the persist operator], and we certainly cannot ask the user to terminate
>> the cluster after execution, while in local mode.
>>
>> -- Sachin Goel
>> Computer Science, IIT Delhi
>> m. +91-9871457685
>>
>> On Wed, Sep 2, 2015 at 9:19 PM, Till Rohrmann 
>> wrote:
>>
>>> Why is it not possible to shut down the local cluster? Can’t you shut it
>>> down in the @AfterClass method?
>>> 
>>>
>>> On Wed, Sep 2, 2015 at 4:56 PM, Sachin Goel 
>>> wrote:
>>>
>>>> Yes. That will work too. However, then it isn't possible to shut down
>>>> the local cluster. [Is it necessary to do so or does it shut down
>>>> automatically when the program exists? I'm not entirely sure.]
>>>>
>>>> -- Sachin Goel
>>>> Computer Science, IIT Delhi
>>>> m. +91-9871457685
>>>>
>>>> On Wed, Sep 2, 2015 at 7:59 PM, Stephan Ewen  wrote:
>>>>
>>>>> Have a look at some other tests, like the checkpointing tests. They
>>>>> start one cluster manually and keep it running. They connect against it
>>>>> using the remote environment ("localhost",
>>>>> miniCluster.getJobManagerRpcPort()).
>>>>>
>>>>> That works nicely...
>>>>>
>>>>> On Wed, Sep 2, 2015 at 4:23 PM, Sachin Goel 
>>>>> wrote:
>>>>>
>>>>>> Hi all
>>>>>> While using LocalEnvironment, in case the program triggers execution
>>>>>> several times, the {{LocalFlinkMiniCluster}} is started as many times. 
>>>>>> This
>>>>>> can consume a lot of time in setting up and tearing down the cluster.
>>>>>> Further, this hinders with a new functionality I'm working on based on
>>>>>> persisted results.
>>>>>> One potential solution could be to follow the methodology in
>>>>>> `MultipleProgramsTestBase`. The user code then would have to reside in a
>>>>>> fixed function name, instead of the main method. Or is that too 
>>>>>> cumbersome?
>>>>>>
>>>>>> Regards
>>>>>> Sachin
>>>>>> -- Sachin Goel
>>>>>> Computer Science, IIT Delhi
>>>>>> m. +91-9871457685
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel

If I'm right, all Tests use either the MultipleProgramTestBase or
JavaProgramTestBase. Those shut down the cluster explicitly anyway.
I will make sure if this is the case.

Regards
Sachin

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Wed, Sep 2, 2015 at 9:40 PM, Till Rohrmann  wrote:

> Maybe we can create a single PlanExecutor for the LocalEnvironment which
> is used when calling execute. This of course entails that we don’t call
> stop on the LocalCluster. For cases where the program exits after calling
> execute, this should be fine because all resources will then be released
> anyway. It might matter for the test execution where maven reuses the JVMs
> and where the LocalFlinkMiniCluster won’t be garbage collected right
> away. You could try it out and see what happens.
>
> Cheers,
> Till
> 
>
> On Wed, Sep 2, 2015 at 6:03 PM, Till Rohrmann 
> wrote:
>
>> Oh sorry, then I got the wrong context. I somehow thought it was about
>> test cases because I read `MultipleProgramTestBase` etc. Sorry my bad.
>>
>> On Wed, Sep 2, 2015 at 6:00 PM, Sachin Goel 
>> wrote:
>>
>>> I was under the impression that the @AfterClass annotation can only be
>>> used in test classes.
>>> Even so, the idea is that a user program running in the IDE should not
>>> be starting up the cluster several times [my primary concern is the
>>> addition of the persist operator], and we certainly cannot ask the user to
>>> terminate the cluster after execution, while in local mode.
>>>
>>> -- Sachin Goel
>>> Computer Science, IIT Delhi
>>> m. +91-9871457685
>>>
>>> On Wed, Sep 2, 2015 at 9:19 PM, Till Rohrmann 
>>> wrote:
>>>
>>>> Why is it not possible to shut down the local cluster? Can’t you shut
>>>> it down in the @AfterClass method?
>>>> 
>>>>
>>>> On Wed, Sep 2, 2015 at 4:56 PM, Sachin Goel 
>>>> wrote:
>>>>
>>>>> Yes. That will work too. However, then it isn't possible to shut down
>>>>> the local cluster. [Is it necessary to do so or does it shut down
>>>>> automatically when the program exists? I'm not entirely sure.]
>>>>>
>>>>> -- Sachin Goel
>>>>> Computer Science, IIT Delhi
>>>>> m. +91-9871457685
>>>>>
>>>>> On Wed, Sep 2, 2015 at 7:59 PM, Stephan Ewen  wrote:
>>>>>
>>>>>> Have a look at some other tests, like the checkpointing tests. They
>>>>>> start one cluster manually and keep it running. They connect against it
>>>>>> using the remote environment ("localhost",
>>>>>> miniCluster.getJobManagerRpcPort()).
>>>>>>
>>>>>> That works nicely...
>>>>>>
>>>>>> On Wed, Sep 2, 2015 at 4:23 PM, Sachin Goel >>>>> > wrote:
>>>>>>
>>>>>>> Hi all
>>>>>>> While using LocalEnvironment, in case the program triggers execution
>>>>>>> several times, the {{LocalFlinkMiniCluster}} is started as many times. 
>>>>>>> This
>>>>>>> can consume a lot of time in setting up and tearing down the cluster.
>>>>>>> Further, this hinders with a new functionality I'm working on based on
>>>>>>> persisted results.
>>>>>>> One potential solution could be to follow the methodology in
>>>>>>> `MultipleProgramsTestBase`. The user code then would have to reside in a
>>>>>>> fixed function name, instead of the main method. Or is that too 
>>>>>>> cumbersome?
>>>>>>>
>>>>>>> Regards
>>>>>>> Sachin
>>>>>>> -- Sachin Goel
>>>>>>> Computer Science, IIT Delhi
>>>>>>> m. +91-9871457685
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel

I'm not sure what you mean by "Crucial cleanup is in shutdown hooks". Could
you elaborate?

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Wed, Sep 2, 2015 at 10:25 PM, Stephan Ewen  wrote:

> You can always shut down a cluster manually (via shutdown()) and if the
> JVM simply exists, all is well as well. Crucial cleanup is in shutdown
> hooks.
>
> On Wed, Sep 2, 2015 at 6:22 PM, Till Rohrmann 
> wrote:
>
>> If I'm not mistaken, then the cluster should be properly terminated when
>> it gets garbage collected. Thus, also when the main method exits.
>>
>> On Wed, Sep 2, 2015 at 6:14 PM, Sachin Goel 
>> wrote:
>>
>>> If I'm right, all Tests use either the MultipleProgramTestBase or
>>> JavaProgramTestBase. Those shut down the cluster explicitly anyway.
>>> I will make sure if this is the case.
>>>
>>> Regards
>>> Sachin
>>>
>>> -- Sachin Goel
>>> Computer Science, IIT Delhi
>>> m. +91-9871457685
>>>
>>> On Wed, Sep 2, 2015 at 9:40 PM, Till Rohrmann 
>>> wrote:
>>>
>>>> Maybe we can create a single PlanExecutor for the LocalEnvironment
>>>> which is used when calling execute. This of course entails that we
>>>> don’t call stop on the LocalCluster. For cases where the program exits
>>>> after calling execute, this should be fine because all resources will then
>>>> be released anyway. It might matter for the test execution where maven
>>>> reuses the JVMs and where the LocalFlinkMiniCluster won’t be garbage
>>>> collected right away. You could try it out and see what happens.
>>>>
>>>> Cheers,
>>>> Till
>>>> 
>>>>
>>>> On Wed, Sep 2, 2015 at 6:03 PM, Till Rohrmann 
>>>> wrote:
>>>>
>>>>> Oh sorry, then I got the wrong context. I somehow thought it was about
>>>>> test cases because I read `MultipleProgramTestBase` etc. Sorry my bad.
>>>>>
>>>>> On Wed, Sep 2, 2015 at 6:00 PM, Sachin Goel 
>>>>> wrote:
>>>>>
>>>>>> I was under the impression that the @AfterClass annotation can only
>>>>>> be used in test classes.
>>>>>> Even so, the idea is that a user program running in the IDE should
>>>>>> not be starting up the cluster several times [my primary concern is the
>>>>>> addition of the persist operator], and we certainly cannot ask the user 
>>>>>> to
>>>>>> terminate the cluster after execution, while in local mode.
>>>>>>
>>>>>> -- Sachin Goel
>>>>>> Computer Science, IIT Delhi
>>>>>> m. +91-9871457685
>>>>>>
>>>>>> On Wed, Sep 2, 2015 at 9:19 PM, Till Rohrmann 
>>>>>> wrote:
>>>>>>
>>>>>>> Why is it not possible to shut down the local cluster? Can’t you
>>>>>>> shut it down in the @AfterClass method?
>>>>>>> 
>>>>>>>
>>>>>>> On Wed, Sep 2, 2015 at 4:56 PM, Sachin Goel <
>>>>>>> sachingoel0...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Yes. That will work too. However, then it isn't possible to shut
>>>>>>>> down the local cluster. [Is it necessary to do so or does it shut down
>>>>>>>> automatically when the program exists? I'm not entirely sure.]
>>>>>>>>
>>>>>>>> -- Sachin Goel
>>>>>>>> Computer Science, IIT Delhi
>>>>>>>> m. +91-9871457685
>>>>>>>>
>>>>>>>> On Wed, Sep 2, 2015 at 7:59 PM, Stephan Ewen 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Have a look at some other tests, like the checkpointing tests.
>>>>>>>>> They start one cluster manually and keep it running. They connect 
>>>>>>>>> against
>>>>>>>>> it using the remote environment ("localhost",
>>>>>>>>> miniCluster.getJobManagerRpcPort()).
>>>>>>>>>
>>>>>>>>> That works nicely...
>>>>>>>>>
>>>>>>>>> On Wed, Sep 2, 2015 at 4:23 PM, Sachin Goel <
>>>>>>>>> sachingoel0...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all
>>>>>>>>>> While using LocalEnvironment, in case the program triggers
>>>>>>>>>> execution several times, the {{LocalFlinkMiniCluster}} is started as 
>>>>>>>>>> many
>>>>>>>>>> times. This can consume a lot of time in setting up and tearing down 
>>>>>>>>>> the
>>>>>>>>>> cluster. Further, this hinders with a new functionality I'm working 
>>>>>>>>>> on
>>>>>>>>>> based on persisted results.
>>>>>>>>>> One potential solution could be to follow the methodology in
>>>>>>>>>> `MultipleProgramsTestBase`. The user code then would have to reside 
>>>>>>>>>> in a
>>>>>>>>>> fixed function name, instead of the main method. Or is that too 
>>>>>>>>>> cumbersome?
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>> Sachin
>>>>>>>>>> -- Sachin Goel
>>>>>>>>>> Computer Science, IIT Delhi
>>>>>>>>>> m. +91-9871457685
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Convergence Criterion in IterativeDataSet

2015-09-04 Thread Sachin Goel

Hi Andres
Does something like this solve what you're trying to achieve?
https://github.com/apache/flink/pull/918/files

Regards
Sachin
On Sep 4, 2015 6:24 PM, "Stephan Ewen"  wrote:

> I think you can do this with the current interface. The convergence
> criterion object stays around, so you should be able to simply store the
> current aggregator value in a field (when the check is invoked). Any round
> but the first could compare against that field.
>
> On Fri, Sep 4, 2015 at 2:25 PM, Andres R. Masegosa 
> wrote:
>
>> Hi,
>>
>>
>> I trying to implement some machine learning algorithms that involve
>> several iterations until convergence (to a fixed point).
>>
>> My idea is to use a IterativeDataSet with an Aggregator which produces
>> the result (i.e. a set of parameters defining the model).
>>
>> From the interface "ConvergenceCriterion", I can understand that the
>> convergence criterion only depends on the result of the aggregator in
>> the current iteration (as happens with the DoubleZeroConvergence class).
>>
>> However, it is more usual to test convergence by comparing the result of
>> the aggregator in the current iteration with the result of the
>> aggregator in the previous iteration (one usually stops when both
>> results are similar enough and we have converged to a fixed point).
>>
>> I guess this functionality is not included yet. And this is because the
>> convergence criteria of flink implementations of K-Means and Linear
>> Regression is to stop after a fixed number of iterations.
>>
>>
>> Am I wrong?
>>
>>
>> Regards
>> Andres
>>
>
>

Re: Distribute DataSet to subset of nodes

2015-09-13 Thread Sachin Goel

Hi Stefan
Just a clarification : The output corresponding to an element based on the
whole data will be a union of the outputs based on the two halves. Is this
what you're trying to achieve? [It appears  like that since every flatMap
task will independently produce outputs.]

In that case, one solution could be to simply have two flatMap operations
based on parts of the *broadcast* data set, and take a union.

Cheers
Sachin
On Sep 13, 2015 7:04 PM, "Stefan Bunk"  wrote:

> Hi!
>
> Following problem: I have 10 nodes on which I want to execute a flatMap
> operator on a DataSet. In the open method of the operator, some data is
> read from disk and preprocessed, which is necessary for the operator.
> Problem is, the data does not fit in memory on one node, however, half of
> the data does.
> So in five out of ten nodes, I stored one half of the data to be read in
> the open method, and the other half on the other five nodes.
>
> Now my question: How can I distribute my DataSet, so that each element is
> sent once to a node with the first half of my data and once to a node with
> the other half?
>
> I looked at implementing a custom partitioner, however my problems were:
> (i) I have no mapping from the number I am supposed to return to the nodes
> to the data. How do I know, that index 5 contains one half of the data, and
> index 6 the other half?
> (ii) I do not know the current index. Obviously, I want to send my DataSet
> element only once over the network.
>
> Best,
> Stefan
>

Re: "Not enough free slots available to run the job" for word count example

2015-09-13 Thread Sachin Goel

Hi Daniel
Your problem did get solved, I assume.

As for the -p flag, it determines the default parallelism of operators at
runtime. If you end up specifying a value more than the slots available,
that's an issue. Hope that helped.

Cheers
Sachin
On Sep 13, 2015 9:13 PM, "Daniel Blazevski" 
wrote:

> Hello,
>
> I am not sure if I can give updates to an email I send to the user list
> before getting any response, but here is a quick update:
>
> I tried to run using one processor:
> ./bin/flink run -p 1 ./examples/flink-java-examples-0.9.1-WordCount.jar
>
> and that worked.  It seems to be an issue with configuring to the other
> workers.
>
> I further realized that the reason there were only 5 processing slots on
> the Dashboard was that I only changed flnk-conf.yaml on the master node,
> though I changed that for all workers as well, and the Dashboard now shows
> 8 processing slots.
>
> I stopped and re-started the cluster, and the example runs (even w/o the
> -p 1 setting)
>
> Best,
> Dan
>
>
>
> On Sun, Sep 13, 2015 at 10:40 AM, Daniel Blazevski <
> daniel.blazev...@gmail.com> wrote:
>
>> Hello,
>>
>> I am new to Flink, I setup a Flink cluster on 4 m4.large Amazon EC2
>> instances, and set the following in link-conf.yaml:
>>
>> jobmanager.heap.mb: 4000
>> taskmanager.heap.mb: 5000
>> taskmanager.numberOfTaskSlots: 2
>> parallelism.default: 8
>>
>> In the 8081 dashboard, it shows 4 for Task Manager and 5 for Processing
>> Slots ( I’m not sure if “5” is OK here?).
>>
>> I then tried to execute:
>>
>> ./bin/flink run ./examples/flink-java-examples-0.9.1-WordCount.jar
>>
>> and got the following error message:
>> Error: java.lang.IllegalStateException: Could not schedule consumer
>> vertex CHAIN Reduce (SUM(1), at main(WordCount.java:72) -> FlatMap
>> (collect()) (7/8)
>> at
>> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:482)
>> at
>> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:472)
>> at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94)
>> at
>> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
>> at
>> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
>> at
>> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> Caused by:
>> org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException:
>> Not enough free slots available to run the job. You can decrease the
>> operator parallelism or increase the number of slots per TaskManager in the
>> configuration. Task to schedule: < Attempt #0 (CHAIN Reduce (SUM(1), at
>> main(WordCount.java:72) -> FlatMap (collect()) (7/8)) @ (unassigned) -
>> [SCHEDULED] > with groupID < 6adebf08c73e7f3adb6ea20f8950d627 > in sharing
>> group < SlotSharingGroup [02cac542946daf808c406c2b18e252e0,
>> d883aa4274b6cef49ab57aaf3078147c, 6adebf08c73e7f3adb6ea20f8950d627] >.
>> Resources available to scheduler: Number of instances=4, total number of
>> slots=5, available slots=0
>> at
>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:251)
>> at
>> org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleImmediately(Scheduler.java:126)
>> at
>> org.apache.flink.runtime.executiongraph.Execution.scheduleForExecution(Execution.java:271)
>> at
>> org.apache.flink.runtime.executiongraph.ExecutionVertex.scheduleForExecution(ExecutionVertex.java:430)
>> at
>> org.apache.flink.runtime.executiongraph.Execution$3.call(Execution.java:478)
>> ... 9 more
>>
>>
>> More details about my setup: I am running Ubuntu on the master node and 3
>> data nodes.  If it matters, I already had hadoop 2.7.1 running and
>> downloaded and installed the latest version of Flink, which is technically
>> for hadoop 2.7.0.
>>
>> Thanks,
>> Dan
>>
>
>

Re: Distribute DataSet to subset of nodes

2015-09-13 Thread Sachin Goel

Of course, someone else might have better ideas in re the partitioner. :)
On Sep 14, 2015 1:12 AM, "Sachin Goel"  wrote:

> Hi Stefan
> Just a clarification : The output corresponding to an element based on the
> whole data will be a union of the outputs based on the two halves. Is this
> what you're trying to achieve? [It appears  like that since every flatMap
> task will independently produce outputs.]
>
> In that case, one solution could be to simply have two flatMap operations
> based on parts of the *broadcast* data set, and take a union.
>
> Cheers
> Sachin
> On Sep 13, 2015 7:04 PM, "Stefan Bunk"  wrote:
>
>> Hi!
>>
>> Following problem: I have 10 nodes on which I want to execute a flatMap
>> operator on a DataSet. In the open method of the operator, some data is
>> read from disk and preprocessed, which is necessary for the operator.
>> Problem is, the data does not fit in memory on one node, however, half of
>> the data does.
>> So in five out of ten nodes, I stored one half of the data to be read in
>> the open method, and the other half on the other five nodes.
>>
>> Now my question: How can I distribute my DataSet, so that each element is
>> sent once to a node with the first half of my data and once to a node with
>> the other half?
>>
>> I looked at implementing a custom partitioner, however my problems were:
>> (i) I have no mapping from the number I am supposed to return to the
>> nodes to the data. How do I know, that index 5 contains one half of the
>> data, and index 6 the other half?
>> (ii) I do not know the current index. Obviously, I want to send my
>> DataSet element only once over the network.
>>
>> Best,
>> Stefan
>>
>

Stuck builds on travis

2015-10-09 Thread Sachin Goel

These two builds are stuck on travis. It seems to a travis issue and is
limiting the number of concurrent builds to 3.
https://travis-ci.org/apache/flink/jobs/84317317
https://travis-ci.org/apache/flink/jobs/84405887

Perhaps someone from infra should cancel them.

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

Re: Stuck builds on travis

2015-10-09 Thread Sachin Goel

Found another one: https://travis-ci.org/apache/flink/jobs/84473635

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Fri, Oct 9, 2015 at 7:06 PM, Sachin Goel 
wrote:

> These two builds are stuck on travis. It seems to a travis issue and is
> limiting the number of concurrent builds to 3.
> https://travis-ci.org/apache/flink/jobs/84317317
> https://travis-ci.org/apache/flink/jobs/84405887
>
> Perhaps someone from infra should cancel them.
>
> -- Sachin Goel
> Computer Science, IIT Delhi
> m. +91-9871457685
>

Re: Bug in Hybrid Hash Join

2015-10-12 Thread Sachin Goel

Hi Flavio
This was addressed in Flink-2763. Can you check if you're using the latest
version?

Cheers!
Sachin
On Oct 13, 2015 5:04 AM, "Flavio Pompermaier"  wrote:

> Hi guys,
> my job fails on Flink 0.10-snapshot with the following message: Bug in
> Hybrid Hash Join: Request to spill a partition with less than two buffers.I
> tried to double network buffers but I still got the error :(
>
> Is it a real bug or am I doing something wrong?
>
> See you very soon :)
> Flavio
>

Re: Bug in Hybrid Hash Join

2015-10-13 Thread Sachin Goel

Okay. Great!
Please re-open the jira in case the issue hasn't been resolved.

Cheers!
Sachin

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Tue, Oct 13, 2015 at 4:00 PM, Flavio Pompermaier 
wrote:

> I ensured that I was using the latest version and I showed to program to
> Fabian right now.
> The stack trace is:
>
> Caused by: java.lang.RuntimeException: Bug in Hybrid Hash Join: Request to
> spill a partition with less than two buffers.
> at
> org.apache.flink.runtime.operators.hash.HashPartition.spillPartition(HashPartition.java:301)
> at
> org.apache.flink.runtime.operators.hash.MutableHashTable.spillPartition(MutableHashTable.java:1108)
> at
> org.apache.flink.runtime.operators.hash.MutableHashTable.nextSegment(MutableHashTable.java:1277)
> at
> org.apache.flink.runtime.operators.hash.HashPartition$BuildSideBuffer.nextSegment(HashPartition.java:523)
> at
> org.apache.flink.runtime.memory.AbstractPagedOutputView.advance(AbstractPagedOutputView.java:140)
> at
> org.apache.flink.runtime.memory.AbstractPagedOutputView.write(AbstractPagedOutputView.java:201)
> at
> org.apache.flink.api.java.typeutils.runtime.DataOutputViewStream.write(DataOutputViewStream.java:39)
> at com.esotericsoftware.kryo.io.Output.flush(Output.java:163)
> at
> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.serialize(KryoSerializer.java:187)
> at
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:116)
> at
> org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:30)
> at
> org.apache.flink.runtime.operators.hash.HashPartition.insertIntoBuildBuffer(HashPartition.java:256)
> at
> org.apache.flink.runtime.operators.hash.MutableHashTable.insertIntoTable(MutableHashTable.java:856)
> at
> org.apache.flink.runtime.operators.hash.MutableHashTable.buildInitialTable(MutableHashTable.java:685)
> at
> org.apache.flink.runtime.operators.hash.MutableHashTable.open(MutableHashTable.java:443)
> at
> org.apache.flink.runtime.operators.hash.NonReusingBuildSecondHashMatchIterator.open(NonReusingBuildSecondHashMatchIterator.java:85)
> at
> org.apache.flink.runtime.operators.JoinDriver.prepare(JoinDriver.java:195)
> at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:459)
> ... 3 more
>
>
> On Tue, Oct 13, 2015 at 5:20 AM, Sachin Goel 
> wrote:
>
>> Hi Flavio
>> This was addressed in Flink-2763. Can you check if you're using the
>> latest version?
>>
>> Cheers!
>> Sachin
>> On Oct 13, 2015 5:04 AM, "Flavio Pompermaier" 
>> wrote:
>>
>>> Hi guys,
>>> my job fails on Flink 0.10-snapshot with the following message: Bug in
>>> Hybrid Hash Join: Request to spill a partition with less than two buffers.I
>>> tried to double network buffers but I still got the error :(
>>>
>>> Is it a real bug or am I doing something wrong?
>>>
>>> See you very soon :)
>>> Flavio
>>>
>>
>

Re: Running continuously on yarn with kerberos

2015-11-07 Thread Sachin Goel

Usually, if all the dependencies are being downloaded, i.e., on the first
build, it'll likely take 30-40 minutes. Subsequent builds might take 10
minutes approx. [I have the same PC configuration.]

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Sun, Nov 8, 2015 at 2:05 AM, Niels Basjes  wrote:

> How long should this take if you have HDD and about 8GB of RAM?
> Is that 10 minutes? 20?
>
> Niels
>
> On Sat, Nov 7, 2015 at 2:51 PM, Stephan Ewen  wrote:
>
>> Hi Niels!
>>
>> Usually, you simply build the binaries by invoking "mvn -DskipTests clean
>> package" in the root flink directory. The resulting program should be in
>> the "build-target" directory.
>>
>> If the program gets stuck, let us know where and what the last message on
>> the command line is.
>>
>> Please be aware that the final step of building the "flink-dist" project
>> may take a while, especially on systems with hard disks (as opposed to
>> SSDs) and a comparatively low amount of memory. The reason is that the
>> building of the final JAR file is quite expensive, because the system
>> re-packages certain libraries in order to avoid conflicts between different
>> versions.
>>
>> Stephan
>>
>>
>> On Sat, Nov 7, 2015 at 2:40 PM, Niels Basjes  wrote:
>>
>>> Hi,
>>>
>>> Excellent.
>>> What you can help me with are the commands to build the binary
>>> distribution from source.
>>> I tried it last Thursday and the build seemed to get stuck at some point
>>> (at the end of/just after building the dist module).
>>> I haven't been able to figure out why yet.
>>>
>>> Niels
>>> On 5 Nov 2015 14:57, "Maximilian Michels"  wrote:
>>>
>>>> Thank you for looking into the problem, Niels. Let us know if you need
>>>> anything. We would be happy to merge a pull request once you have verified
>>>> the fix.
>>>>
>>>> On Thu, Nov 5, 2015 at 1:38 PM, Niels Basjes  wrote:
>>>>
>>>>> I created https://issues.apache.org/jira/browse/FLINK-2977
>>>>>
>>>>> On Thu, Nov 5, 2015 at 12:25 PM, Robert Metzger 
>>>>> wrote:
>>>>>
>>>>>> Hi Niels,
>>>>>> thank you for analyzing the issue so properly. I agree with you. It
>>>>>> seems that HDFS and HBase are using their own tokes which we need to
>>>>>> transfer from the client to the YARN containers. We should be able to 
>>>>>> port
>>>>>> the fix from Spark (which they got from Storm) into our YARN client.
>>>>>> I think we would add this in org.apache.flink.yarn.Utils#
>>>>>> setTokensFor().
>>>>>>
>>>>>> Do you want to implement and verify the fix yourself? If you are to
>>>>>> busy at the moment, we can also discuss how we share the work (I'm
>>>>>> implementing it, you test the fix)
>>>>>>
>>>>>>
>>>>>> Robert
>>>>>>
>>>>>> On Tue, Nov 3, 2015 at 5:26 PM, Niels Basjes  wrote:
>>>>>>
>>>>>>> Update on the status so far I suspect I found a problem in a
>>>>>>> secure setup.
>>>>>>>
>>>>>>> I have created a very simple Flink topology consisting of a
>>>>>>> streaming Source (the outputs the timestamp a few times per second) and 
>>>>>>> a
>>>>>>> Sink (that puts that timestamp into a single record in HBase).
>>>>>>> Running this on a non-secure Yarn cluster works fine.
>>>>>>>
>>>>>>> To run it on a secured Yarn cluster my main routine now looks like
>>>>>>> this:
>>>>>>>
>>>>>>> public static void main(String[] args) throws Exception {
>>>>>>> System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
>>>>>>> UserGroupInformation.loginUserFromKeytab("nbas...@xx.net", 
>>>>>>> "/home/nbasjes/.krb/nbasjes.keytab");
>>>>>>>
>>>>>>> final StreamExecutionEnvironment env = 
>>>>>>> StreamExecutionEnvironment.getExecutionEnvironment();
>>>>>>> env.setParallelism(1);
>>>>>>>
>>>>>>> DataStream stream = env.addSo

Running into memory issues while running on Yarn

2017-01-05 Thread Sachin Goel

Hey!

I'm running locally under this configuration(copied from nodemanager logs):
physical-memory=8192 virtual-memory=17204 virtual-cores=8

Before starting a flink deployment, memory usage stats show 3.7 GB used on
system, indicating lots of free memory for flink containers.
However, after I submit using minimal resource requirements,
./yarn-session.sh -n 1 -tm 768, the cluster deploys successfully but then
every application on system receives a sigterm and it basically kills the
current user session, logging out of the system.

The job manager and task manager logs contain just the information that a
SIGTERM was received and shut down gracefully.
All yarn and dfs process contain the log information showing the receipt of
a sigterm.

Here's the relevant log from nodemanager:

2017-01-05 17:00:06,089 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1483603191971_0002_01_02 transitioned from
LOCALIZED to RUNNING
2017-01-05 17:00:06,092 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
launchContainer: [bash,
/opt/hadoop-2.7.3/tmp/nm-local-dir/usercache/kirk/appcache/application_1483603191971_0002/container_1483603191971_0002_01_02/default_container_executor.sh]
2017-01-05 17:00:08,731 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Starting resource-monitoring for
container_1483603191971_0002_01_02
2017-01-05 17:00:08,744 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 17872 for container-id
container_1483603191971_0002_01_01: 282.7 MB of 1 GB physical
memory used; 2.1 GB of 2.1 GB virtual memory used
2017-01-05 17:00:08,744 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Process tree for container: container_1483603191971_0002_01_01 has
processes older than 1 iteration running over the configured limit.
Limit=2254857728, current usage = 2255896576
2017-01-05 17:00:08,745 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Container [pid=17872,containerID=container_1483603191971_0002_01_01]
is running beyond virtual memory limits. Current usage: 282.7 MB of 1
GB physical memory used; 2.1 GB of 2.1 GB virtual memory used. Killing
container.
Dump of the process-tree for container_1483603191971_0002_01_01 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES)
FULL_CMD_LINE
|- 17872 17870 17872 17872 (bash) 0 0 21409792 812 /bin/bash -c
/usr/lib/jvm/java-8-openjdk-amd64//bin/java -Xmx424M
-Dlog.file=/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_01/jobmanager.log
-Dlogback.configurationFile=file:logback.xml
-Dlog4j.configuration=file:log4j.properties
org.apache.flink.yarn.YarnApplicationMasterRunner
1>/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_01/jobmanager.out
2>/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_01/jobmanager.err
|- 17879 17872 17872 17872 (java) 748 20 2234486784 71553
/usr/lib/jvm/java-8-openjdk-amd64//bin/java -Xmx424M
-Dlog.file=/opt/hadoop-2.7.3/logs/userlogs/application_1483603191971_0002/container_1483603191971_0002_01_01/jobmanager.log
-Dlogback.configurationFile=file:logback.xml
-Dlog4j.configuration=file:log4j.properties
org.apache.flink.yarn.YarnApplicationMasterRunner

2017-01-05 17:00:08,745 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Removed ProcessTree with root 17872
2017-01-05 17:00:08,746 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
Container container_1483603191971_0002_01_01 transitioned from
RUNNING to KILLING
2017-01-05 17:00:08,746 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1483603191971_0002_01_01
2017-01-05 17:00:08,779 ERROR
org.apache.hadoop.yarn.server.nodemanager.NodeManager: RECEIVED SIGNAL
15: SIGTERM
2017-01-05 17:00:08,822 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exit code from container container_1483603191971_0002_01_01 is :
143
2017-01-05 17:00:08,825 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exit code from container container_1483603191971_0002_01_02 is :
143


Is the memory available on my pc not enough or are there any known issues
which might lead to this?

Also, this doesn't occur every time I start a flink session.

Thanks
Sachin

Re: REST api: how to upload jar?

2017-01-24 Thread Sachin Goel

Hey Cliff
You can upload a jar file using http post with the file data sent under a
form field 'jarfile'.

Can you also please open a jira for fixing the documentation?

- Sachin


On Jan 25, 2017 06:55, "Cliff Resnick"  wrote:

> The 1.2 release documentation (https://ci.apache.org/project
> s/flink/flink-docs-release-1.2/monitoring/rest_api.html)  states "It is
> possible to upload, run, and list Flink programs via the REST APIs and web
> frontend". However there is no documentation about uploading a jar via REST
> api. Does this mean that upload is only supported via the web frontend?  I
> did notice that if I manually upload a jar to the configured upload dir an
> prepend its name with a uuid it does get recognized and I can POST a job
> start, but this is messy and I'd rather use the api if supported.
>
> -Cliff
>

Re: Choosing random element

Re: Run Time Exception

Re: loop break operation

Re: loop break operation

Re: filter as termination condition

Travis updates on Github

Multiple restarts of Local Cluster

Re: Multiple restarts of Local Cluster

Re: Multiple restarts of Local Cluster

Re: Multiple restarts of Local Cluster

Re: Multiple restarts of Local Cluster

Re: Multiple restarts of Local Cluster

Re: Convergence Criterion in IterativeDataSet

Re: Distribute DataSet to subset of nodes

Re: "Not enough free slots available to run the job" for word count example

Re: Distribute DataSet to subset of nodes

Stuck builds on travis

Re: Stuck builds on travis

Re: Bug in Hybrid Hash Join

Re: Bug in Hybrid Hash Join

Re: Running continuously on yarn with kerberos

Running into memory issues while running on Yarn

Re: REST api: how to upload jar?

23 matches

Site Navigation

Mail list logo

Footer information