Hi, I was trying to run my program in the flink web environment (the local one)
when I run it I get the graph of the planned execution but in each node there
is a "parallelism = 1”, instead i think it runs with par = 8 (8 core, i always
get 8 output)
what does that mean?
is that wrong or is it
Something similar in flink-0.10-SNAPSHOT:
06/29/2015 10:33:46 CHAIN Join(Join at main(TriangleCount.java:79)) ->
Combine (Reduce at main(TriangleCount.java:79))(222/224) switched to FAILED
java.lang.Exception: The slot in which the task was executed has been
released. Probably loss of TaskMana
ok thanks!
then by now i will use it until true outer join is ready
Il giorno 29/giu/2015, alle ore 18:22, Fabian Hueske
mailto:fhue...@gmail.com>> ha scritto:
Yes, if you need outer join semantics you have to go with CoGroup.
Some members of the Flink community are working on true outer joins
Yes, if you need outer join semantics you have to go with CoGroup.
Some members of the Flink community are working on true outer joins for
Flink, but I don't know what the progress is.
Best, Fabian
2015-06-29 18:05 GMT+02:00 Michele Bertoni
:
> thanks both for answering,
> that’s what i expect
thanks both for answering,
that’s what i expected
I was using join at first but sadly i had to move from join to cogroup because
I need outer join
the alternative to the cogroup is to “complete” the inner join extracting from
the original dataset what did not matched in the cogroup by differenc
If you just want to do the pairwise comparison try join().
Join is an inner join and will give you all pairs of elements with matching
keys.
For CoGroup, there is no other way than collecting one side in memory.
Best, Fabian
2015-06-29 17:42 GMT+02:00 Matthias J. Sax :
> Why do you not use a joi
Why do you not use a join? CoGroup seems not to be the right operator.
-Matthias
On 06/29/2015 05:40 PM, Michele Bertoni wrote:
> Hi I have a question on cogroup
>
> when I cogroup two dataset is there a way to compare each element on the left
> with each element on the right (inside a group) w
Hi I have a question on cogroup
when I cogroup two dataset is there a way to compare each element on the left
with each element on the right (inside a group) without collecting one side?
right now I am doing
left.cogroup(right).where(0,1,2).equalTo(0,1,2){
(leftIterator, rightIterator,
I think that actually there's an Exception thrown within the code that I
suspect it's not reported anywhere..could it be?
On Mon, Jun 29, 2015 at 3:28 PM, Flavio Pompermaier
wrote:
> Which file and which JVM options do I have to modify to try options 1 and
> 3..?
>
>1. Don't fill the JVMs up
Ah, thank you!
- If you create a data set from a Java/Scala collection, this data source
has the parallelism one.
- The map function is chained to that source, so it runs with parallelism
one as well.
- To run it with a higher parallelism, use "setParallelism(...)" on the
mapFunction, or call "
Yeah, sorry. I would like to do something simple like this, but using
Java Threads.
DataSet> input = env.fromCollection(in);
DataSet output = input.map(new HighWorkLoad());
ArrayList result = output.consume(); // ? like collect but in
parallel, some operation that consumes the pipeline.
return r
It is not quite easy to understand what you are trying to do.
Can you post your program here? Then we can take a look and give you a good
answer...
On Mon, Jun 29, 2015 at 3:47 PM, Juan Fumero <
juan.jose.fumero.alfo...@oracle.com> wrote:
> Is there any other way to apply the function in paralle
Is there any other way to apply the function in parallel and return the
result to the client in parallel?
Thanks
Juan
On Mon, 2015-06-29 at 15:01 +0200, Stephan Ewen wrote:
> In general, avoid collect if you can. Collect brings data top the
> client, where the computation is not parallel any mor
Which file and which JVM options do I have to modify to try options 1 and
3..?
1. Don't fill the JVMs up to the limit with objects. Give more memory to
the JVM, or give less memory to Flink managed memory
2. Use more JVMs, i.e., a higher parallelism
3. Use a concurrent garbage collecto
In general, avoid collect if you can. Collect brings data top the client,
where the computation is not parallel any more.
Try to do as much on the DataSet as possible.
On Mon, Jun 29, 2015 at 2:58 PM, Juan Fumero <
juan.jose.fumero.alfo...@oracle.com> wrote:
> Hi Stephan,
> so should I use ano
Hi Stephan,
so should I use another method instead of collect? It seems
multithread is not working with this.
Juan
On Mon, 2015-06-29 at 14:51 +0200, Stephan Ewen wrote:
> Hi Juan!
>
>
> This is an artifact of a workaround right now. The actual collect()
> logic happens in the flatMap() a
Hi Juan!
This is an artifact of a workaround right now. The actual collect() logic
happens in the flatMap() and the sink is a dummy that executes nothing. The
flatMap writes the data to be collected to the "accumulator" that delivers
it back.
Greetings,
Stephan
On Mon, Jun 29, 2015 at 2:30 PM,
Hi,
I am starting with Flink. I have tried to look for the documentation
but I havent found it clear.
I wonder the difference between these two states:
FlatMap RUNNING vs DataSink RUNNIG.
FlatMap is doing data any data transformation? Compilation? In which
point is actually executing the f
Hi Flavio!
I had a look at the logs. There seems nothing suspicious - at some point,
the TaskManager and JobManager declare each other unreachable.
A pretty common cause for that is that the JVMs stall for a long time due
to garbage collection. The JobManager cannot see the difference between a
J
I witnessed a similar issue yesterday on a simple job (single task chain,
no shuffles) with a release-0.9 based fork.
2015-04-15 14:59 GMT+02:00 Flavio Pompermaier :
> Yes , sorry for that..I found it somewhere in the logs..the problem was
> that the program didn't die immediately but was somehow
Ok..I don't like to repeat twice myDir but for the moment I can live with
it ;)
Thanks,
Flavio
On Mon, Jun 29, 2015 at 12:41 PM, Stephan Ewen wrote:
> You can also do "myDir.getFileSystem().exists(myDir)", but I don't think
> there is a shorter way...
>
> On Mon, Jun 29, 2015 at 12:39 PM, Flavi
You can also do "myDir.getFileSystem().exists(myDir)", but I don't think
there is a shorter way...
On Mon, Jun 29, 2015 at 12:39 PM, Flavio Pompermaier
wrote:
> Hi to all,
>
> in my job I have to check if a directory exists and currently I have to
> write:
>
> Path myDir = new Path(...);
> boole
Hi to all,
in my job I have to check if a directory exists and currently I have to
write:
Path myDir = new Path(...);
boolean exists = FileSystem.get(myDir.toUri()).exists(myDir);
Is there a better way to achieve this?
Best,
Flavio
Hi Flavio!
Can you post the JobManager's log here? It should have the message about
what is going wrong...
Stephan
On Mon, Jun 29, 2015 at 11:43 AM, Flavio Pompermaier
wrote:
> Hi to all,
>
> I'm restarting the discussion about a problem I alredy dicussed on this
> mailing list (but that star
Hi Flinksters,
we had recently a discussion in our working group which Language we should
use with Flink. To bring it to the point: most people would like to use
Python because the are familiar with it and there is a nice scientific
stack to f.e. print and analyse the results. But our concern is t
Hi to all,
I'm restarting the discussion about a problem I alredy dicussed on this
mailing list (but that started with a different subject).
I'm running Flink 0.9.0 on CDH 5.1.3 so I compiled the sources as:
mvn clean install -Dhadoop.version=2.3.0-cdh5.1.3
-Dhbase.version=0.98.1-cdh5.1.3 -Dhado
@Stephan: I don't think there is a way to deal with this. In my
understanding, the (main) purpose of the user@ list is not to report Flink
bugs. It is a forum for users to help each other.
Flink committers happen to know a lot about the system, so its easy for
them to help users. Also, its a good w
Hi Hawin!
The performance tuning of Kafka is much trickier than that of Flink. Your
performance bottleneck may be Kafka at this point, not Flink.
To make Kafka fast, make sure you have the right setup for the data
directories, and you set up zookeeper properly (for good throughput).
To test the K
Static fields are not parts of the serialized program (by Java's
definition). Whether the static field has the same value in the cluster
JVMs depends on how the static field is initialized, whether it is
initialized the same way in the shipped code, without the program's main
method.
BTW: We are
Thank you for the answer Robert!
I realize it's a single JVM running, yet I would expect programs to behave
in the same way, i.e. serialization to happen (even if not necessary), in
order to catch this kind of bugs before cluster deployment.
Is this simply not possible or is it a design choice we
Dear Marton
Thanks for your asking. Yes. it is working now.
But, the TPS is not very good. I have met four issues as below
1. My TPS around 2000 events per second. But I saw some companies
achieved 132K per second on single node at 2015 Los Angeles big data day
yesterday.
It is working in the IDE because there we execute everything in the same
JVM, so the mapper can access the correct value of the static variable.
When submitting a job with the CLI frontend, there are at least two JVMs
involved, and code running in the JM/TM can not access the value from the
static
32 matches
Mail list logo