Re: FlinkKafkaConsumer010 does not start from the next record on startup from offsets in Kafka

2017-11-23 Thread r. r.
Gordon, thanks for clarifying this! For my experimental project I decided to disable checkpointing and use kafkaConsumer.setStartFromGroupOffsets() (explicitly, although docs state it is the default). I verified with kafka-consumer-offset-checker.sh that, after the job fails and is restarted,

Re: FlinkKafkaConsumer010 does not start from the next record on startup from offsets in Kafka

2017-11-22 Thread r. r.
Thanks Gordon But what if there is an uncaught exception in processing of the record (during normal job execution, after deserialization)? After the restart strategy exceeds the failure rate, the job will fail and on re-run it would start at the same offset, right? Is there a way to avoid this

Re: all task managers reading from all kafka partitions

2017-11-18 Thread r. r.
> Оригинално писмо >От: Gary Yao g...@data-artisans.com >Относно: Re: all task managers reading from all kafka partitions >До: "r. r." <rob...@abv.bg> >Изпратено на: 18.11.2017 11:28 > > >

Re: all task managers reading from all kafka partitions

2017-11-17 Thread r. r.
! > Оригинално писмо >От: Gary Yao g...@data-artisans.com >Относно: Re: all task managers reading from all kafka partitions >До: "r. r." <rob...@abv.bg> >Изпратено на: 17.11.2017 22:58 > > > > >

Re: all task managers reading from all kafka partitions

2017-11-17 Thread r. r.
localhost > Оригинално писмо >От: Gary Yao g...@data-artisans.com >Относно: Re: all task managers reading from all kafka partitions >До: "r. r." <rob...@abv.bg> >Изпратено на: 17.11.2017 20:02 > >

all task managers reading from all kafka partitions

2017-11-17 Thread r. r.
Hi I have this strange problem: 4 task managers each with one task slot, attaching to the same Kafka topic which has 10 partitions. When I post a single message to the Kafka topic it seems that all 4 consumers fetch the message and start processing (confirmed by TM logs). If I run

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-20 Thread r. r.
! > Оригинално писмо >От: Piotr Nowojski pi...@data-artisans.com >Относно: Re: java.lang.NoSuchMethodError and dependencies problem >До: "r. r." <rob...@abv.bg> >Изпратено на: 20.10.2017 14:46 > But you said > > > this seems to wor

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-19 Thread r. r.
regards > Оригинално писмо >От: Piotr Nowojski pi...@data-artisans.com >Относно: Re: java.lang.NoSuchMethodError and dependencies problem >До: "r. r." <rob...@abv.bg> >Изпратено на: 19.10.2017 20:04 > I’m not 100% sure, so tr

Re: java.lang.NoSuchMethodError and dependencies problem

2017-10-19 Thread r. r.
ns.com >Относно: Re: java.lang.NoSuchMethodError and dependencies problem >До: "r. r." <rob...@abv.bg> >Изпратено на: 19.10.2017 18:00 > Hi, > > What is the full stack trace of the error? > Are you sure that there is no commons-compresss somewhere in

java.lang.NoSuchMethodError and dependencies problem

2017-10-19 Thread r. r.
Hello I have a job that runs an Apache Tika pipeline and it fails with "Caused by: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStreamFactory.detect(Ljava/io/InputStream;)Ljava/lang/String;" Flink includes commons-compress 1.4.1, while Tika needs 1.14. I also have

Re: Best way to setup different log files for distinct jobs

2017-10-12 Thread r. r.
what if you have 'dedicated' task managers for each job? so if you have 2 TMs each with 1 task slot and two jobs with -p1 then each job will go to the respective TM, I think? hence - each job in its own (TM's) log I'm new to Flink, hope it make sense > Оригинално писмо

keep-alive job strategy

2017-10-06 Thread r. r.
Hello I have set up a cluster and added taskmanagers manually with bin/taskmanager.sh start. I noticed that if i have 5 task managers with one slot each and start a job with -p5, then if i stop a taskmanager the job will fail even if there are 4 more taskmanagers. Is this expected (I turned

Re: kafka consumer parallelism

2017-10-05 Thread r. r.
Thanks a lot, Carst! I hadn't realized that Best regards > Оригинално писмо >От: Carst Tankink ctank...@bol.com >Относно: Re: kafka consumer parallelism >До: "r. r." <rob...@abv.bg> >Изпратено на: 05.10.2017 09:04 > Hi, >

Re: kafka consumer parallelism

2017-10-04 Thread r. r.
e than 1 Kafka > partition to process multiple documents at the same time. Make also sure to > send the documents to different partitions. > > Regards, > Timo > > > Am 10/2/17 um 6:46 PM schrieb r. r.: > > Hello > &

kafka consumer parallelism

2017-10-02 Thread r. r.
Hello I'm running a job with "flink run -p5" and additionally set env.setParallelism(5). The source of the stream is Kafka, the job uses FlinkKafkaConsumer010. In Flink UI though I notice that if I send 3 documents to Kafka, only one 'instance' of the consumer seems to receive Kafka's record and

Re: how many 'run -c' commands to start?

2017-10-02 Thread r. r.
Thanks, Chesnay, that was indeed the problem. It also explains why -p5 was not working for me from the cmdline Best regards Robert > Оригинално писмо >От: Chesnay Schepler ches...@apache.org >Относно: Re: how many 'run -c' commands to start? >До:

Re: state of parallel jobs when one task fails

2017-09-29 Thread r. r.
Thanks a lot - wasn't aware of FailoverStrategy Best regards Robert > Оригинално писмо >От: Piotr Nowojski pi...@data-artisans.com >Относно: Re: state of parallel jobs when one task fails >До: "r. r." <rob...@abv.bg> >

Re: how many 'run -c' commands to start?

2017-09-29 Thread r. r.
ed that enough TaskManagers are > still alive to satisfy the resource requirements of the job. > > > Can you elaborate a bit more what happened when you used the  --detached > param? > > On 28.09.2017 16:33, r. r. wrote: > >

state of parallel jobs when one task fails

2017-09-29 Thread r. r.
Hello I have a simple job with a single map() processing which I want to run with many documents in parallel in Flink. What will happen if one of the 'instances' of the job fails?   This statement in Flink docs confuses me: "In case of failures, a job switches first to failing where it cancels

Re: how many 'run -c' commands to start?

2017-09-28 Thread r. r.
uld only call `flink run ...` to submit a > job once; for simplicity i would submit it on the node where you started > the cluster. Flink will automatically distribute job across the cluster, > in smaller independent parts known as Tasks. > > Regards, > Chesnay

how many 'run -c' commands to start?

2017-09-28 Thread r. r.
Hello I successfully ran a job with 'flink run -c', but this is for the local setup. How should i proceed with a cluster? Will flink automagically instantiate the job on all servers - i hope i don't have to start 'flink run -c' on all machines. New to flink and bigdata, so sorry for the