Re: Ceph configuration for checkpoints?

2018-02-13 Thread Piotr Nowojski
Hi, Have you tried to refer to ceph documentation? http://docs.ceph.com/docs/jewel/cephfs/hadoop/ It claims to be: > a drop-in replacement for the Hadoop File System (HDFS) So I would first try to configure ceph according to their documentation

Re: Rebalance to subtasks in same TaskManager instance

2018-02-06 Thread Piotr Nowojski
Hi, Unfortunately I don’t think it’s currently possible in the Flink. Please feel free to submit a feature request for it on our JIRA https://issues.apache.org/jira/projects/FLINK/summary Have you tried out the setup using rebalance? In m

Re: Reduce parallelism without network transfer.

2018-02-06 Thread Piotr Nowojski
gt; not rescale ? > > Regards, > Kien > > Sent from TypeApp <http://www.typeapp.com/r?b=11979> > On Feb 5, 2018, at 15:28, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > Hi, > > It should work like this out of the box if you use rescale met

Re: Why does jobmanager running needs slot ?

2018-02-05 Thread Piotr Nowojski
ice. > > > > > > At 2018-02-05 17:56:49, "Piotr Nowojski" wrote: > It seems so - but I’m saying this only basing on a annotations when this > method was added (in the last couple of months). I’m not that much familiar > with those code pa

Re: Why does jobmanager running needs slot ?

2018-02-05 Thread Piotr Nowojski
at FLIP6 ? > > Rice. > > > > > > At 2018-02-05 17:49:05, "Piotr Nowojski" wrote: > I might be wrong but I think it is other way around and the naming of this > method is correct - it does exactly what it says. TaskManager comes with some > predefined tas

Re: Why does jobmanager running needs slot ?

2018-02-05 Thread Piotr Nowojski
rename it to > requestSlotsFromJobManager. I dont know whether it is sounds OKay for that. I > just feel like offerSlotToJobManager sounds strange.. What do you think of > this ? > > Rice. > > > > > > At 2018-02-05 17:30:32, "Piotr Nowojski" wrote: > org.apache.flink

Re: Checkpoint is not triggering as per configuration

2018-02-05 Thread Piotr Nowojski
Hi, Did you check task manager and job manager logs for any problems? Piotrek > On 5 Feb 2018, at 03:19, syed wrote: > > Hi > I am new to the flink world, and trying to understand. Currently, I am using > Flink 1.3.2 on a small cluster of 4 nodes, > I have configured checkpoint directory at H

Re: Why does jobmanager running needs slot ?

2018-02-05 Thread Piotr Nowojski
org.apache.flink.runtime.jobmaster.JobMaster#offerSlots is a receiver side of an RPC call that is being initiated on the sender side: org.apache.flink.runtime.taskexecutor.TaskExecutor#offerSlotsToJobManager. In other words, JobMasterGateway.offerSlots is called by a TaskManager and it is a way

Re: Getting Key from keyBy() in ProcessFunction

2018-02-05 Thread Piotr Nowojski
I think now it’s not easily possible, however it might be a valid suggestion to add `OnTimerContext#getCurrentKey()` method. Besides using ValueState as you discussed before, as a some kind of a walk around you could copy and modify KeyedProcessOperator to suits your needs, but this would be m

Re: Global window keyBy

2018-02-05 Thread Piotr Nowojski
Hi, FIRE_AND_PURGE triggers `org.apache.flink.api.common.state.State#clear()` call and it "Removes the value mapped under the current key.”. So other keys should remain unmodified. I hope this solves your problem/question? Piotrek > On 4 Feb 2018, at 15:39, miki haiat wrote: > > Im using t

Re: Reduce parallelism without network transfer.

2018-02-05 Thread Piotr Nowojski
Hi, It should work like this out of the box if you use rescale method: https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/index.html#physical-partitioning

Re: Flink not writing last few elements to disk

2018-02-05 Thread Piotr Nowojski
Hi, FileProcessMode.PROCESS_CONTINUOUSLY processes the file continuously - the stream will not end. Simple `writeAsCsv(…)` on the other hand only flushes the output file on a stream end (see `OutputFormatSinkFunction`). You can either use `PROCESS_ONCE` mode or use more advanced data sink: -

Re: Latest version of Kafka

2018-02-02 Thread Piotr Nowojski
Hi, Flink as for now provides only a connector for Kafka 0.11, which is using KafkaClient in 0.11.x version. However you should be able to use it for reading to/writing from Kafka 1.0 - Kafka claims (and as far as I know it’s true) that Kafka 1.0 is backward compatible with 0.11. Piotrek > O

Re: Send ACK when all records of file are processed

2018-01-30 Thread Piotr Nowojski
l get complicated. > > > > Regards, > Vinay Patil > > On Tue, Jan 30, 2018 at 1:57 AM, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > Thanks for the clarification :) > > Since you have one Job per an ACK, you can just relay on > Watermark(Lo

Re: Send ACK when all records of file are processed

2018-01-25 Thread Piotr Nowojski
; > instead of add sink, should it be a simple map operator which writes to DB > so that we can have a next ack operator which will generate the response. > > Also, how do I get/access the Watermark value in the ack operator ? It will > be a simple map operator, right ? > &g

Re: Data exchange between tasks (operators/sources) at streaming api runtime

2018-01-25 Thread Piotr Nowojski
> Ishwara Varnasi > > Sent from my iPhone > > On Jan 25, 2018, at 4:03 AM, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > >> Hi, >> >> As far as I know there is currently no simple way to do this: >> Join stream with static d

Re: Send ACK when all records of file are processed

2018-01-25 Thread Piotr Nowojski
Hi, As you figured out, some dummy EOF record is one solution, however you might try to achieve it also by wrapping an existing CSV function. Your wrapper could emit this dummy EOF record. Another (probably better) idea is to use Watermark(Long.MAX_VALUE) for the EOF marker. Stream source and/o

Re: Avoiding deadlock with iterations

2018-01-25 Thread Piotr Nowojski
Hi, This is a known problem and I don’t think there is an easy solution to this. Please refer to the: http://mail-archives.apache.org/mod_mbox/flink-user/201704.mbox/%3c5486a7fd-41c3-4131-5100-272825088...@gaborhermann.com%3E

Re: Data exchange between tasks (operators/sources) at streaming api runtime

2018-01-25 Thread Piotr Nowojski
Hi, As far as I know there is currently no simple way to do this: Join stream with static data in https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API and https://issue

Re: What's the meaning of "Registered `TaskManager` at akka://flink/deadLetters " ?

2018-01-19 Thread Piotr Nowojski
log file, > > Thanks > > On Mon, Jan 15, 2018 at 3:36 PM, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > Hi, > > Could you post full job manager and task manager logs from startup until the > first signs of the problem? > > Thanks, Piotrek >

Re: What's the meaning of "Registered `TaskManager` at akka://flink/deadLetters " ?

2018-01-15 Thread Piotr Nowojski
at ??? (akka://flink/deadLetters) as > 626846ae27a833cb094eeeb047a6a72c. Current number of registered hosts is 2. > Current number of alive task slots is 40. > > > On Wed, Jan 10, 2018 at 11:36 AM, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > Hi, > &g

Re: Stream job failed after increasing number retained checkpoints

2018-01-10 Thread Piotr Nowojski
limitation in the strict sense, but you might run out of > dfs space or job manager memory if you keep around a huge number checkpoints. > I wonder what reason you might have that you ever want such a huge number of > retained checkpoints? Usually keeping one checkpoint should do the job,

Re: Datastream broadcast with KeyBy

2018-01-10 Thread Piotr Nowojski
Hi, Could you elaborate what is the problem that you are having? What is the exception(s) that you are getting? I have tested such simple example and it’s seems to be working as expected: DataStreamSource input = env.fromElements(1, 2, 3, 4, 5, 1, 2, 3); DataStreamSource confStream = env.fromE

Re: Custom Partitioning for Keyed Streams

2018-01-10 Thread Piotr Nowojski
Hi, I don’t think it is possible to enforce scheduling of two keys to different nodes, since all of that is based on hashes. For some cases, doing the pre-aggregation step (initial aggregation done before keyBy, which is followed by final aggregation after the keyBy) can be the solution for ha

Re: What's the meaning of "Registered `TaskManager` at akka://flink/deadLetters " ?

2018-01-10 Thread Piotr Nowojski
Hi, Search both job manager and task manager logs for ip address(es) and port(s) that have timeouted. First of all make sure that nodes are visible to each other using some simple ping. Afterwards please check that those timeouted ports are opened and not blocked by some firewall (telnet). You

Re: Stream job failed after increasing number retained checkpoints

2018-01-09 Thread Piotr Nowojski
Hi, Increasing akka’s timeouts is rarely a solution for any problems - it either do not help, or just mask the issue making it less visible. But yes, it is possible to bump the limits: https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#distributed-coordination-via-akk

Re: periodic trigger

2018-01-03 Thread Piotr Nowojski
received later then i might receive more > than one row for a single user based on the number of windows created by the > events of this user. That will make the the average computations wrong. > > On 22.12.2017 12:10, Piotr Nowojski wrote: >> Ok, I think now I understand your problem

Re: entrypoint for executing job in task manager

2017-12-22 Thread Piotr Nowojski
or opening. > > On Thu, Dec 21, 2017 at 12:27 AM, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > Open method is called just before any elements are processed. You can hook in > any initialisation logic there, including initialisation of a static context. > Howe

Re: periodic trigger

2017-12-22 Thread Piotr Nowojski
ote: > > Imagine a case where i want to run a computation every X seconds for 1 day > window. I want the calculate average session length for current day every X > seconds. Is there an easy way to achieve that? > > On 21.12.2017 16:06, Piotr Nowojski wrote: >> Hi, >>

Re: periodic trigger

2017-12-21 Thread Piotr Nowojski
Hi, You defined a tumbling window (https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#tumbling-windows ) of 60 seconds, triggered every 10 seconds. This means that each inpu

Re: Metric reporters with non-static ports

2017-12-21 Thread Piotr Nowojski
I am not sure (and because of holiday season you might not get an answer quickly), however I do not see a way to obtain this port other then by looking into the log files. On the other hand, I have an impression, that intention of this feature was that if you must execute N reporters on one sing

Re: entrypoint for executing job in task manager

2017-12-21 Thread Piotr Nowojski
Open method is called just before any elements are processed. You can hook in any initialisation logic there, including initialisation of a static context. However keep in mind, that since this context is static, it will be shared between multiple operators (if you are running parallelism > numb

Re: Custom Metrics

2017-12-14 Thread Piotr Nowojski
.meter(userId, new DropwizardMeterWrapper(new > com.codahale.metrics.Meter())); > Thanks a bunch. > > On Mon, Dec 11, 2017 at 11:12 PM, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > Hi, > > Reporting once per 10 seconds shouldn’t create

Re: Flink flick cancel vs stop

2017-12-14 Thread Piotr Nowojski
e Kafka transactions to bet committed and is > not atomic. > > So it seems like there is a need for an atomic stop or cancel with savepoint > that waits for transactional sinks to commit and then immediately stop any > further message processing. > > > On Tue, Oct 24, 2017

Re: Off heap memory issue

2017-12-13 Thread Piotr Nowojski
he same behavior for arrays of Chars and > Bytes (as expected), but for this particular Class "java.lang.Class" the > clusters that have 24/7 jobs have less than 20K instances of that class, > whereas the other cluster has 383,120 > instances. I don't know if this c

Re: How to deal with dynamic types

2017-12-11 Thread Piotr Nowojski
Hi, For truly dynamic class you would need a custom TypeInformation or TypeDeserializationSchema and store the fields on some kind of Map. Maybe something could be done with inheritance if records that always share the same fields could be deserialized to some specific class with fixed/predefin

Re: Custom Metrics

2017-12-11 Thread Piotr Nowojski
etrics. > > The option 2 that I had in mind was to collect all metrics and use influx db > sink to report it directly inside the pipeline. But it seems reporting per > node might not be possible. > > > On Mon, Dec 11, 2017 at 3:14 AM, Piotr Nowojski <mailto:pi...

Re: REST api: how to upload jar?

2017-12-11 Thread Piotr Nowojski
Hi, Have you tried this https://stackoverflow.com/questions/41724269/apache-flink-rest-client-jar-upload-not-working ? Piotrek > On 11 Dec 2017, at 14:22, Edward wrote: > > Let me try that again

Re: Exception when using the time attribute in table API

2017-12-11 Thread Piotr Nowojski
Hi, NoSuchMethodError probably comes from some mismatched compile/runtime versions of the Flink. Do you have to use 1.4-SNAPSHOT version? It can change on daily basis, so you have to be more careful about what Flink jar's you are using at runtime and what on compile time. If you really need som

Re: ayncIO & TM akka response

2017-12-11 Thread Piotr Nowojski
Hi, Please search the task manager logs for the potential reason of failure/disconnecting around the time when you got this error on the job manager. There should be some clearly visible exception. Thanks, Piotrek > On 9 Dec 2017, at 20:35, Chen Qin wrote: > > Hi there, > > In recent, our

Re: Custom Metrics

2017-12-11 Thread Piotr Nowojski
Hi, I’m not sure if I completely understand your issue. 1. - You don’t have to pass RuntimeContext, you can always pass just the MetricGroup or ask your components/subclasses “what metrics do you want to register” and register them at the top level. - Reporting tens/hundreds/thousands of metric

Re: How & Where does flink stores data for aggregations.

2017-11-24 Thread Piotr Nowojski
Hi, Flink will have to maintain state of the defined aggregations per each window and key (the more names you have, the bigger the state). Flink’s state backend will be used for that (for example memory or rocksdb). However in most cases state will be small and not dependent on the length of t

Re: Correlation between data streams/operators and threads

2017-11-21 Thread Piotr Nowojski
different from the current > approach I had provided the code for above, and will it solve this problem of > different data streams not getting distributed across slots? > > Thanks again, > Shailesh > > On Fri, Nov 17, 2017 at 3:01 PM, Piotr Nowojski <mailto:pi...@data-a

Re: org.apache.flink.runtime.io.network.NetworkEnvironment causing memory leak?

2017-11-17 Thread Piotr Nowojski
Hi, If the TM is not responding check the TM logs if there is some long gap in logs. There might be three main reasons for such gaps: 1. Machine is swapping - setup/configure your machine/processes that machine never swap (best to disable swap altogether) 2. Long GC full stops - look how to ana

Re: Flink memory leak

2017-11-17 Thread Piotr Nowojski
alVm and Console, we > attached the screenshot of results. > Could you help us about evaluating the results? > > -Ebru > > >> On 14 Nov 2017, at 19:29, Piotr Nowojski > <mailto:pi...@data-artisans.com>> wrote: >> >> Best would be to analyse memory usage via

Re: Correlation between data streams/operators and threads

2017-11-17 Thread Piotr Nowojski
tly nothing happens, can you provide > the output of "jstack " (with the PID of your JobManager)? > f) You may further be able to debug into what is happening by running this in > your IDE in debug mode and pause the execution when you suspect it to hang. > > > Ni

Re: Off heap memory issue

2017-11-15 Thread Piotr Nowojski
Hi, I have been able to observe some off heap memory “issues” by submitting Kafka job provided by Javier Lopez (in different mailing thread). TL;DR; There was no memory leak, just memory pool “Metaspace” and “Compressed Class Space” are growing in size over time and are only rarely garbage co

Re: Flink memory leak

2017-11-14 Thread Piotr Nowojski
2017 at 4:02 PM, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > Ebru, Javier, Flavio: > > I tried to reproduce memory leak by submitting a job, that was generating > classes with random names. And indeed I have found one. Memory was > accumulating

Re: Flink memory leak

2017-11-14 Thread Piotr Nowojski
f not, I would really need more help/information from you to track this down. Piotrek > On 10 Nov 2017, at 15:12, ÇETİNKAYA EBRU ÇETİNKAYA EBRU > wrote: > > On 2017-11-10 13:14, Piotr Nowojski wrote: >> jobmanager1.log and taskmanager2.log are the same. Can you also submit

Re: Correlation between data streams/operators and threads

2017-11-14 Thread Piotr Nowojski
isualization. > > 3. Have attached the logs and exception raised (15min - configured akka > timeout) after submitting the job. > > Thanks, > Shailesh > > > On Tue, Nov 14, 2017 at 2:46 PM, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > Hi, >

Re: Correlation between data streams/operators and threads

2017-11-14 Thread Piotr Nowojski
being deployed per device instead of > separate streams help here? > > 4. Is there an upper limit on number task slots which can be configured? I > know that my operator state size at any given point in time would not be very > high, so it looks OK to deploy independent jobs w

Re: Correlation between data streams/operators and threads

2017-11-13 Thread Piotr Nowojski
Sure, let us know if you have other questions or encounter some issues. Thanks, Piotrek > On 13 Nov 2017, at 14:49, Shailesh Jain wrote: > > Thanks, Piotr. I'll try it out and will get back in case of any further > questions. > > Shailesh > > On Fri, Nov 10, 20

Re: Flink memory leak

2017-11-10 Thread Piotr Nowojski
flink-pnowojski-taskmanager-9-piotr-mbp.log Description: Binary data > On 10 Nov 2017, at 16:05, ÇETİNKAYA EBRU ÇETİNKAYA EBRU > wrote: > > On 2017-11-10 18:01, ÇETİNKAYA EBRU ÇETİNKAYA EBRU wrote: >> On 2017-11-10 17:50, Piotr Nowojski wrote: >>> I do not see anything

Re: Flink memory leak

2017-11-10 Thread Piotr Nowojski
> On 2017-11-10 13:14, Piotr Nowojski wrote: >> jobmanager1.log and taskmanager2.log are the same. Can you also submit >> files containing std output? >> Piotrek >>> On 10 Nov 2017, at 09:35, ÇETİNKAYA EBRU ÇETİNKAYA EBRU >>> wrote: >>> On 2017-11-10

Re: Correlation between data streams/operators and threads

2017-11-10 Thread Piotr Nowojski
t order will be correct almost always) and > there are high chances of loosing out on events on operators like Patterns > which work with windows. Any ideas for workarounds here? > > > Thanks, > Shailesh > > On 09-Nov-2017 8:48 PM, "Piotr Nowojski"

Re: Flink memory leak

2017-11-10 Thread Piotr Nowojski
jobmanager1.log and taskmanager2.log are the same. Can you also submit files containing std output? Piotrek > On 10 Nov 2017, at 09:35, ÇETİNKAYA EBRU ÇETİNKAYA EBRU > wrote: > > On 2017-11-10 11:04, Piotr Nowojski wrote: >> Hi, >> Thanks for the logs, however I do

Re: Flink memory leak

2017-11-10 Thread Piotr Nowojski
BRU > wrote: > > On 2017-11-09 20:08, Piotr Nowojski wrote: >> Hi, >> Could you attach full logs from those task managers? At first glance I >> don’t see a connection between those exceptions and any memory issue >> that you might had. It looks like a dependency

Re: Flink memory leak

2017-11-09 Thread Piotr Nowojski
ausing the problem? There might be some Flink incompatibility between different versions and rebuilding a job’s jar with a version matching to the cluster version might help. Piotrek > On 9 Nov 2017, at 17:36, ÇETİNKAYA EBRU ÇETİNKAYA EBRU > wrote: > > On 2017-11-08 18:30, Pi

Re: How to best create a bounded session window ?

2017-11-09 Thread Piotr Nowojski
ever increasing. We may decide to fire_and_purge. fire > etc but the window remains live. Or did I get that part wrong ? > > > Vishal. > > > > > On Thu, Nov 9, 2017 at 8:24 AM, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > It might be more

Re: Correlation between data streams/operators and threads

2017-11-09 Thread Piotr Nowojski
Hi, 1. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/parallel.html Number of threads executing would be roughly speaking equal to of the number of input data streams multiplied by the parallelism.

Re: Weird performance on custom Hashjoin w.r.t. parallelism

2017-11-09 Thread Piotr Nowojski
Hi, Yes as you correctly analysed parallelism 1 was causing problems, because it meant that all of the records must been gathered over the network from all of the task managers. Keep in mind that even if you increase parallelism to “p”, every change in parallelism can slow down your application

Re: How to best create a bounded session window ?

2017-11-09 Thread Piotr Nowojski
It might be more complicated if you want to take into account events coming in out of order. For example you limit length of window to 5 and you get the following events: 1 2 3 4 6 7 8 5 Do you want to emit windows: [1 2 3 4 5] (length limit exceeded) + [6 7 8] ? Or are you fine with interlea

Re: Flink memory leak

2017-11-08 Thread Piotr Nowojski
time with `fromElements` source instead of >> > Kafka, right? >> > Did you try it also without a Kafka producer? >> > Piotrek >> > >> > On 8 Nov 2017, at 14:57, Javier Lopez > > <mailto:javier.lo...@zalando.de>> wrote: >> > Hi, >&

Re: Flink memory leak

2017-11-08 Thread Piotr Nowojski
heap size. Piotrek > On 8 Nov 2017, at 15:28, Javier Lopez wrote: > > Yes, I tested with just printing the stream. But it could take a lot of time > to fail. > > On Wednesday, 8 November 2017, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > > Thanks

Re: Flink memory leak

2017-11-08 Thread Piotr Nowojski
die faster. I tested as well with a > small data set, using the fromElements source, but it will take some time to > die. It's better with some data. > > On 8 November 2017 at 14:54, Piotr Nowojski <mailto:pi...@data-artisans.com>> wrote: > Hi, > > Thanks for

Re: Flink memory leak

2017-11-08 Thread Piotr Nowojski
quot;; and nothing seems to work. > > Thanks for your help. > > On 8 November 2017 at 13:26, ÇETİNKAYA EBRU ÇETİNKAYA EBRU > mailto:b20926...@cs.hacettepe.edu.tr>> wrote: > On 2017-11-08 15:20, Piotr Nowojski wrote: > Hi Ebru and Javier, > > Yes, if you could shar

Re: Flink memory leak

2017-11-08 Thread Piotr Nowojski
Hi Ebru and Javier, Yes, if you could share this example job it would be helpful. Ebru: could you explain in a little more details how does your Job(s) look like? Could you post some code? If you are just using maps and filters there shouldn’t be any network transfers involved, aside from Sourc

Re: Docker-Flink Project: TaskManagers can't talk to JobManager if they are on different nodes

2017-11-06 Thread Piotr Nowojski
.StreamInputProcessor.processInput(StreamInputProcessor.java:213) > at > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:69) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:

Re: Docker-Flink Project: TaskManagers can't talk to JobManager if they are on different nodes

2017-11-02 Thread Piotr Nowojski
Did you try to expose required ports that are listed in the README when starting the containers? https://github.com/apache/flink/tree/master/flink-contrib/docker-flink Ports: • The Web Client is on port 48081

Re: Flink flick cancel vs stop

2017-10-24 Thread Piotr Nowojski
I would propose implementations of NewSource to be not blocking/asynchronous. For example something like public abstract Future getCurrent(); Which would allow us to perform some certain actions while there are no data available to process (for example flush output buffers). Something like this

Re: Write each group to its own file

2017-10-23 Thread Piotr Nowojski
You’re welcome :) > On 23 Oct 2017, at 20:43, Rodrigo Lazoti wrote: > > Piotr, > > I did as you suggested and it worked perfectly. > Thank you! :) > > Best, > Rodrigo > > On Thu, Oct 12, 2017 at 5:11 AM, Piotr Nowojski <mailto:pi...@data-artisans

Re: HBase config settings go missing within Yarn.

2017-10-23 Thread Piotr Nowojski
mple project that reproduces the problem on my setup: > https://github.com/nielsbasjes/FlinkHBaseConnectProblem > <https://github.com/nielsbasjes/FlinkHBaseConnectProblem> > > Niels Basjes > > > On Fri, Oct 20, 2017 at 6:54 PM, Piotr Nowojski <mailto:pi...@data-artis

Re: SLF4j logging system gets clobbered?

2017-10-23 Thread Piotr Nowojski
Till could you take a look at this? Piotrek > On 18 Oct 2017, at 20:32, Jared Stehler > wrote: > > I’m having an issue where I’ve got logging setup and functioning for my > flink-mesos deployment, and works fine up to a point (the same point every > time) where it seems to fall back to “defa

Re: flink can't read hdfs namenode logical url

2017-10-23 Thread Piotr Nowojski
nt.java:619) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) > -- > 发件人:Piotr Nowojski > 发送时间:2017年10月20日(星期五) 21:39 > 收件人:邓俊华 > 抄 送:user > 主 题:Re: flink can't read hdfs namenode logica

Re: HBase config settings go missing within Yarn.

2017-10-20 Thread Piotr Nowojski
e settings. > So it seems in the transition into the cluster the application does not copy > everything it has available locally for some reason. > > There is a very high probability I did something wrong, I'm just not seeing > it at this moment. > > Niels > > &

Re:

2017-10-20 Thread Piotr Nowojski
Hi, Only batch API is using managed memory. If you are using streaming API, you can do two things: - estimate max cache size based on for example fraction of max heap size - use WeakReference to implement your cache In batch API, you could estimate max cache size based on: - fraction of (heapSi

Re: flink can't read hdfs namenode logical url

2017-10-20 Thread Piotr Nowojski
Hi, Please double check the content of config files in YARN_CONF_DIR and HADOOP_CONF_DIR (the first one has a priority over the latter one) and that they are pointing to correct files. Also check logs (WARN and INFO) for any relevant entries. Piotrek > On 20 Oct 2017, at 06:07, 邓俊华 wrote: >

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-20 Thread Piotr Nowojski
t; >> ---- Оригинално писмо > >> От: Piotr Nowojski pi...@data-artisans.com > >> Относно: Re: java.lang.NoSuchMethodError and dependencies problem > >> До: "r. r." > >> Изпратено на: 20.10.2017 14:46 > > > > >

Re: Does heap memory gets released after computing windows?

2017-10-20 Thread Piotr Nowojski
Hi, Memory used by session windows should be released once window is triggered (allowedLateness can prolong window’s life). Unless your code introduces some memory leak (by not releasing references) everything should be garbage collected. Keep in mind that session windows with time gap of 10 m

Re: HBase config settings go missing within Yarn.

2017-10-20 Thread Piotr Nowojski
Hi, What do you mean by saying: > When I open the logfiles on the Hadoop cluster I see this: The error doesn’t come from Flink? Where do you execute hbaseConfig.addResource(new Path("file:/etc/hbase/conf/hbase-site.xml")); ? To me it seems like it is a problem with misconfigured HBase and n

Re: Flink BucketingSink, subscribe on moving of file into final state

2017-10-20 Thread Piotr Nowojski
of moving the file into final state. > I thought, that maybe someone has already implemented such thing or knows any > other approaches that will help me to not copy/ paste existing sink impl )) > > Thx ! > > >> On 20 Oct 2017, at 14:37, Piotr Nowojski > <mailto:pi...@data-artisans.com>> wrote: >> >> Piotrek >

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-20 Thread Piotr Nowojski
s path affect this? > > by shade commons-compress do you mean : > > it doesn't have effect either > > as a last resort i may try to rebuild Flink to use 1.14, but don't want to go > there yet =/ > > > Best regards > > > > > > >> -

Re: Flink BucketingSink, subscribe on moving of file into final state

2017-10-20 Thread Piotr Nowojski
Hi, Maybe you can just list files in your basePath and filter out those that have inProgress or pending suffixes? I think you could wrap/implement your own Bucketer and track all the paths that it returns. However some of those might be pending or in progress files that will be committed in t

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-19 Thread Piotr Nowojski
application pom.xml. I’m not sure if this is solvable in some way, or not. Maybe as a walk around, you could shade commons-compress usages in your pom.xml? Piotr Nowojski > On 19 Oct 2017, at 17:36, r. r. wrote: > > flink is started with bin/start-local.sh > > there is no classp

Re: Watermark on connected stream

2017-10-19 Thread Piotr Nowojski
With aitozi we have a hat trick oO > On 19 Oct 2017, at 17:08, Tzu-Li (Gordon) Tai wrote: > > Ah, sorry for the duplicate answer, I didn’t see Piotr’s reply. Slight delay > on the mail client. > > > On 19 October 2017 at 11:05:01 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org >

Re: Watermark on connected stream

2017-10-19 Thread Piotr Nowojski
Hi, As you can see in org.apache.flink.streaming.api.operators.AbstractStreamOperator#processWatermark1 it takes a minimum of both of the inputs. Piotrek > On 19 Oct 2017, at 14:06, Kien Truong wrote: > > Hi, > > If I connect two stream with different watermark, how are the watermark of >

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-19 Thread Piotr Nowojski
Hi, What is the full stack trace of the error? Are you sure that there is no commons-compresss somewhere in the classpath (like in the lib directory)? How are you running your Flink cluster? Piotrek > On 19 Oct 2017, at 13:34, r. r. wrote: > > Hello > I have a job that runs an Apache Tika pip

Re: Set heap size

2017-10-19 Thread Piotr Nowojski
Hi, Just log into the machine and check it’s memory consumption using htop or a similar tool under the load. Remember about subtracting Flink’s memory usage and and file system cache. Piotrek > On 19 Oct 2017, at 10:15, AndreaKinn wrote: > > About task manager heap size Flink doc says: > >

Re: SLF4j logging system gets clobbered?

2017-10-19 Thread Piotr Nowojski
Hi, What versions of Flink/logback are you using? Have you read this: https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/best_practices.html#use-logback-when-running-flink-out-of-the-ide--from-a-java-application

Re: Monitoring job w/LocalStreamEnvironment

2017-10-16 Thread Piotr Nowojski
hanks for responding, see below. > >> On Oct 12, 2017, at 7:51 AM, Piotr Nowojski > <mailto:pi...@data-artisans.com>> wrote: >> >> Hi, >> >> Have you read the following doc? >> https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/strea

Re: Submitting a job via command line

2017-10-13 Thread Piotr Nowojski
Good to hear that :) > On 13 Oct 2017, at 14:40, Alexander Smirnov wrote: > > Thank you so much, it helped! > > From: Piotr Nowojski <mailto:pi...@data-artisans.com>> > Date: Thursday, October 12, 2017 at 6:00 PM > To: Alexander Smirnov mailto:as

Re: Writing an Integration test for flink-metrics

2017-10-13 Thread Piotr Nowojski
ler wrote: > You could also write a custom reporter that opens a socket or similar for > communication purposes. > > You can then either query it for the metrics, or even just trigger the > verification in the reporter, > and fail with an error if the reporter returns an error. &

Re: Beam Application run on cluster setup (Kafka+Flink)

2017-10-13 Thread Piotr Nowojski
Hi, What version of Flink are you using. In earlier 1.3.x releases there were some bugs in Kafka Consumer code. Could you change the log level in Flink to debug? Did you check the Kafka logs for some hint maybe? I guess that metrics like bytes read/input records of this Link application are not

Re: Submitting a job via command line

2017-10-12 Thread Piotr Nowojski
Have you tried this http://mail-archives.apache.org/mod_mbox/flink-user/201705.mbox/%3ccagr9p8bxhljseexwzvxlk+drotyp1yxjy4n4_qgerdzxz8u...@mail.gmail.com%3E

Re: Monitoring job w/LocalStreamEnvironment

2017-10-12 Thread Piotr Nowojski
Hi, Have you read the following doc? https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/testing.html There are some hints regarding testing your application. Especially take a look at the

Re: Beam Application run on cluster setup (Kafka+Flink)

2017-10-12 Thread Piotr Nowojski
Hi, What do you mean by: > With standalone beam application kafka can receive the message, But in cluster setup it is not working. In your example you are reading the data from Kafka and printing them to console. There doesn’t seems to be anything that writes back to Kafka, so what do you mean

Re: Writing to an HDFS file from a Flink stream job

2017-10-12 Thread Piotr Nowojski
tml > > <https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/filesystem_sink.html> > > Best, > Aljoscha > >> On 12. Oct 2017, at 14:55, Piotr Nowojski > <mailto:pi...@data-artisans.com>> wrote: >> >> Hi, >> >&g

Re: Writing to an HDFS file from a Flink stream job

2017-10-12 Thread Piotr Nowojski
Hi, Maybe this is an access rights issue? Could you try to create and write to same file (same directory) in some other way (manually?), using the same user and the same machine as would Flink job do? Maybe there will be some hint in hdfs logs? Piotrek > On 12 Oct 2017, at 00:19, Isuru Suriar

Re: Writing an Integration test for flink-metrics

2017-10-12 Thread Piotr Nowojski
Hi, Doing as you proposed using JMXReporter (or custom reporter) should work. I think there is no easier way to do this at the moment. Piotrek > On 12 Oct 2017, at 04:58, Colin Williams > wrote: > > I have a RichMapFunction and I'd like to ensure Meter fields are properly > incremented. I'v

Re: Kafka 11 connector on Flink 1.3

2017-10-12 Thread Piotr Nowojski
Hi, Kafka 0.11 connector depends on some API changes for Flink 1.4, so without rebasing the code and solving some small issues it is not possible to use it for 1.3.x. We are about to finalizing the timeframe for 1.4 release, it would be great if you could come back with this question after the

<    1   2   3   4   5   6   7   >