Flink not running properly.

2018-09-23 Thread Sarabjyotsingh Multani
Hello Admin,
 When I run "tail -f log/flink-*-taskexecutor-*.out" in command
line , I get the following error : "Invalid maximum direct memory size:
-XX:MaxDirectMemorySize=8388607T
The specified size exceeds the maximum representable size.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit."
Please help.


Re: Running Flink in Google Cloud Platform (GCP) - can Flink be truly elastic?

2018-09-23 Thread Konstantin Knauf
Hi Alexander,

broadly speaking, what you are doing right now, is in line with what is
currently possible with Apache Flink. Can you share a little bit more
information about your setup (K8s/Flink-Standalone?
Job-Mode/Session-Mode?)? You might find Gary's Flink Forward [1] talk
interesting. He demonstrates how a Flink job automatically scales out, when
it is given more resources by the resource manager, e.g. Kubernetes. But
this is still work-in-progress.

Best,

Konstantin

[1]
https://data-artisans.com/flink-forward-berlin/resources/flink-as-a-library-and-still-as-a-framework


On Fri, Sep 21, 2018 at 5:42 PM Dawid Wysakowicz 
wrote:

> Hi Alexander,
>
> I've redirected your question to user mailing list. The goal of
> community list is for "Broader community discussions related to meetups,
> conferences, blog posts and job offers"
>
> Quick answer to your question is that dynamic scaling of flink job's is
> a work in progress. Maybe Gary or Till cc'ed can share some more details
> on that topic.
>
> Best,
>
> Dawid
>
>
> On 21/09/18 17:25, alexander.gard...@rbs.com.INVALID wrote:
> > Hi
> >
> > I'm trying to understand what it means to run a Flink cluster inside the
> Google Cloud Platform and whether it can act in an "elastic" way; if the
> cluster needs more resources to accommodate a sudden demand or increase in
> Flink jobs, will GCP automatically detect this and spool up more Task
> Managers to provide extra task slots?
> >
> > If we consider the following two simple use cases, how would GCP address
> them?
> >
> >
> > 1) No free task slots to run new flink jobs
> >
> > 2) A slow flink job needs an increased parallelism to improve
> throughput
> >
> > Currently, we'd handle the above use cases by:
> >
> >
> > 1) knowing that the job failed due to "no free slots", check the
> exception text, schedule to add a new task manager and rerun the job,
> knowing that there are now available task slots.
> >
> > 2) We'd monitor the speed of the job ourselves, stop the job,
> specify which components (operators) in the stream reqd an increase in
> parallelism (for example via job properties), then relaunch the job; if not
> enough slots were available, we'd have to consider adding extra task
> managers.
> >
> >
> > So my question is...can Google Cloud Platform (GCP) automatically launch
> extra TMs to handle the above?
> >
> > If we proposed to run a Flink cluster in a GCP container, can GCP make
> Flink behave dynamically elastic in the same way that Google DataFlow
> apparently can?
> >
> > Regards
> >
> >
> > Alex
> >
> >
> > The Royal Bank of Scotland plc. Registered in Scotland No 83026.
> Registered Office: 36 St Andrew Square, Edinburgh EH2 2YB. The Royal Bank
> of Scotland is authorised by the Prudential Regulation Authority, and
> regulated by the Financial Conduct Authority and Prudential Regulation
> Authority. The Royal Bank of Scotland N.V. is authorised and regulated by
> the De Nederlandsche Bank and has its seat at Amsterdam, the Netherlands,
> and is registered in the Commercial Register under number 33002587.
> Registered Office: Gustav Mahlerlaan 350, Amsterdam, The Netherlands. The
> Royal Bank of Scotland N.V. and The Royal Bank of Scotland plc are
> authorised to act as agent for each other in certain jurisdictions.
> >
> > National Westminster Bank Plc.  Registered in England No. 929027.
> Registered Office: 135 Bishopsgate, London EC2M 3UR.  National Westminster
> Bank Plc is authorised by the Prudential Regulation Authority, and
> regulated by the Financial Conduct Authority and the Prudential Regulation
> Authority.
> >
> > The Royal Bank of Scotland plc and National Westminster Bank Plc are
> authorised to act as agent for each other.
> >
> > This e-mail message is confidential and for use by the addressee only.
> If the message is received by anyone other than the addressee, please
> return the message to the sender by replying to it and then delete the
> message from your computer.  Internet e-mails are not necessarily secure.
> The Royal Bank of Scotland plc, The Royal Bank of Scotland N.V., National
> Westminster Bank Plc or any affiliated entity (RBS or us) does not accept
> responsibility for changes made to this message after it was sent.  RBS may
> monitor e-mails for business and operational purposes.  By replying to this
> message you understand that the content of your message may be monitored.
> >
> > Whilst all reasonable care has been taken to avoid the transmission of
> viruses, it is the responsibility of the recipient to ensure that the
> onward transmission, opening or use of this message and any attachments
> will not adversely affect its systems or data.  No responsibility is
> accepted by RBS in this regard and the recipient should carry out such
> virus and other checks as it considers appropriate.
> >
> > Visit our website at www.rbs.com 
> >
>
>
>

-- 

Konstantin Knauf | Solution Architect

data Artisans


Re: Between Checkpoints in Kafka 11

2018-09-23 Thread Harshvardhan Agrawal
Hi,

Can someone please help me understand how does the exactly once semantic
work with Kafka 11 in Flink?

Thanks,
Harsh

On Tue, Sep 11, 2018 at 10:54 AM Harshvardhan Agrawal <
harshvardhan.ag...@gmail.com> wrote:

> Hi,
>
> I was going through the blog post on how TwoPhaseCommitSink function works
> with Kafka 11. One of the things I don’t understand is: What is the
> behavior of the Kafka 11 Producer between two checkpoints? Say that the
> time interval between two checkpoints is set to 15 minutes. Will Flink
> buffer all records in memory in that case and start writing to Kafka when
> the next checkpoint starts?
>
> Thanks!
> --
> Regards,
> Harshvardhan
>


-- 

*Regards,Harshvardhan Agrawal*
*267.991.6618 | LinkedIn *


Re: S3 connector Hadoop class mismatch

2018-09-23 Thread Stephan Ewen
There is a Pull Request to enable the new streaming sink for Hadoop < 2.7,
so it may become an option in the next release.

Thanks for bearing with us!

Best,
Stephan


On Sat, Sep 22, 2018 at 2:27 PM Paul Lam  wrote:

>
> Hi Stephan!
>
> It's bad that I'm using Hadoop 2.6, so I have to stick to the old
> bucketing sink. I made it by explicitly setting Hadoop conf for the
> bucketing sink in the user code.
>
> Thank you very much!
>
> Best,
> Paul Lam
>
>
> Stephan Ewen  于2018年9月21日周五 下午6:30写道:
>
>> Hi!
>>
>> The old bucketing sink does not work with the Flink file systems, it only
>> works with Hadoop's direct file system support. IIRC it grabs the Flink
>> File System (which creates s3a) to get the Hadoop config etc and then
>> creates the Hadoop File System (s3a again).
>>
>> The new streaming file sink will use Flink Filesystem support, which is
>> important more efficient streaming fault tolerance. S3 support will be part
>> of Flink 1.7
>>
>> Best,
>> Stephan
>>
>>
>> On Fri, Sep 21, 2018 at 10:41 AM Paul Lam  wrote:
>>
>>> Hi Stefan, Stephan,
>>>
>>> Yes, the `hadoop.security.group.mapping` option is explicitly set
>>> to `org.apache.hadoop.security.LdapGroupsMapping`. Guess that was why the
>>> classloader found an unshaded class.
>>>
>>> I don’t have the permission to change the Hadoop cluster configurations
>>> so I modified the `core-default-shaded.xml` and marked the option as final
>>> to solve the problem, after which the class loading exceptions were gone.
>>>
>>> But anther problem came up (likely not related to the previous problem):
>>>
>>> In case of the old bucketing sink (version 1.5.3), it seems that the `
>>> org.apache.hadoop.fs.s3a.S3AFileSystem` is initiated twice before the
>>> task starts running. The first time is called by `
>>> org.apache.flink.fs.s3hadoop.S3FileSystemFactory` and works well, but
>>> the second time is called by bucketing sink itself, and fails to leverage
>>> the `s3.*` parameters like the access key and the secret key.
>>>
>>> The stack traces are as below:
>>>
>>> ```
>>>
>>> com.amazonaws.AmazonClientException: Unable to load AWS credentials from 
>>> any provider in the chain
>>> at 
>>> com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
>>> at 
>>> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
>>> at 
>>> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
>>> at 
>>> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
>>> at 
>>> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
>>> at 
>>> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(BucketingSink.java:1307)
>>> at 
>>> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:426)
>>> at 
>>> org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:370)
>>> at 
>>> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
>>> at 
>>> org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
>>> at 
>>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
>>> at 
>>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:254)
>>> at 
>>> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:730)
>>> at 
>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:295)
>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:712)
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>> ```
>>>
>>> I haven’t figured out why the s3a filesystem needs to be initiated
>>> twice. And is it a bug that the bucketing sink does not use filesystem
>>> factories to create filesystem?
>>>
>>> Thank you very much!
>>>
>>> Best,
>>> Paul Lam
>>>
>>>
>>> 在 2018年9月20日,23:35,Stephan Ewen  写道:
>>>
>>> Hi!
>>>
>>> A few questions to diagnose/fix this:
>>>
>>>  Do you explicitly configure the "hadoop.security.group.mapping"?
>>>
>>>   - If not, this setting may have leaked in from a Hadoop config in the
>>> classpath. We are fixing this in Flink 1.7, to make this insensitive to
>>> such settings leaking in.
>>>
>>>   - If yes, then please try setting the config variable to
>>> "hadoop.security.group.mapping: 
>>> org.apache.flink.fs.s3hadoop.shaded.org.apache.hadoop.security.LdapGroupsMapping"?
>>>
>>> Please let us know if that works!
>>>
>>>
>>>
>>> On Thu, Sep 20, 2018 at 1:40 PM, Stefan Richter <
>>> s.rich...@data-artisans.com> wrote:
>>> Hi,
>>>
>>> I could not find any open Jira for the problem you describe. Could you
>>> please open one?
>>>
>>> Best,
>>> Stefan
>>>
>>> > Am 19.09.2018 um 09

Re: Strange behaviour with checkpointing and custom FilePathFilter

2018-09-23 Thread Averell
Hi Vino, and all,

I tried to avoid the step to get File Status, and found that the problem is
not there any more. I guess doing that with every single file out of 100K+
files on S3 caused some issue with checkpointing.
Still trying to find the cause, but with lower priority now.

Thanks for your help.

Regards,
Averell   



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Between Checkpoints in Kafka 11

2018-09-23 Thread vino yang
Hi Harshvardhan,

In fact, Flink does not cache data between two checkpoints. In fact, Flink
only calls different operations at different points in time. These
operations are provided by the Kafka client, so you should have a deeper
understanding of the principles of Kafka producer transactions.

In general,

1) TwoPhaseCommitSinkFunction#snapshotState, preCommit old transaction and
begin a new transaction
2) TwoPhaseCommitSinkFunction#notifyCheckpointComplete will commit a
pending transaction
3) TwoPhaseCommitSinkFunction#close will abort current transaction
4) TwoPhaseCommitSinkFunction#initializeState may recoverAndCommit and
recoverAndAbort and begin a new transaction


*Looking at the source code of TwoPhaseCommitSinkFunction and
FlinkKafkaProducer011 will give you a better understanding of the whole
process.*

Note the preCommit will trigger kafka transcation.producer.flush() wich
method will flush unsend records (that is, there may be a local buffer
inside the kafka client, but this is not related to flink). So, producer
transaction is not a check The data of the point is cached locally, or one
piece of data is not sent, or the data is all sent, and the atomicity is
not guaranteed in this form.

>From the implementation of kafka, for the produer transaction, the data
will be sent to the kafka broker first, and the commit operation will
ensure that the data is visible to the consumer.

Thanks, vino.

Harshvardhan Agrawal  于2018年9月23日周日 下午11:48写道:

> Hi,
>
> Can someone please help me understand how does the exactly once semantic
> work with Kafka 11 in Flink?
>
> Thanks,
> Harsh
>
> On Tue, Sep 11, 2018 at 10:54 AM Harshvardhan Agrawal <
> harshvardhan.ag...@gmail.com> wrote:
>
>> Hi,
>>
>> I was going through the blog post on how TwoPhaseCommitSink function
>> works with Kafka 11. One of the things I don’t understand is: What is the
>> behavior of the Kafka 11 Producer between two checkpoints? Say that the
>> time interval between two checkpoints is set to 15 minutes. Will Flink
>> buffer all records in memory in that case and start writing to Kafka when
>> the next checkpoint starts?
>>
>> Thanks!
>> --
>> Regards,
>> Harshvardhan
>>
>
>
> --
>
> *Regards,Harshvardhan Agrawal*
> *267.991.6618 | LinkedIn *
>


Re: Flink not running properly.

2018-09-23 Thread vino yang
Hi,

According to the instructions in the script:

# Long.MAX_VALUE in TB: This is an upper bound, much less direct
memory will be used
TM_MAX_OFFHEAP_SIZE="8388607T"


 I think you may need to confirm if your operating system and the JDK you
installed on the TM are 64-bit.

Thanks, vino.

Sarabjyotsingh Multani  于2018年9月23日周日 下午9:59写道:

> Hello Admin,
>  When I run "tail -f log/flink-*-taskexecutor-*.out" in command
> line , I get the following error : "Invalid maximum direct memory size:
> -XX:MaxDirectMemorySize=8388607T
> The specified size exceeds the maximum representable size.
> Error: Could not create the Java Virtual Machine.
> Error: A fatal exception has occurred. Program will exit."
> Please help.
>
>
>


error closing kafka

2018-09-23 Thread yuvraj singh
Hi all ,


I am getting this error with flink 1.6.0 , please help me .






2018-09-23 07:15:08,846 ERROR
org.apache.kafka.clients.producer.KafkaProducer   - Interrupted
while joining ioThread

java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1257)

at
org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:703)

at
org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:682)

at
org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:661)

at
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.close(FlinkKafkaProducerBase.java:319)

at
org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:43)

at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:117)

at
org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:477)

at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:378)

at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)

at java.lang.Thread.run(Thread.java:745)

2018-09-23 07:15:08,847 INFO  org.apache.kafka.clients.producer.KafkaProducer
  - Proceeding to force close the producer since pending
requests could not be completed within timeout 9223372036854775807 ms.

2018-09-23 07:15:08,860 ERROR
org.apache.flink.streaming.runtime.tasks.StreamTask   - Error
during disposal of stream operator.

org.apache.kafka.common.KafkaException: Failed to close kafka producer

at
org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:734)

at
org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:682)

at
org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:661)

at
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.close(FlinkKafkaProducerBase.java:319)

at
org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:43)

at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:117)

at
org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:477)

at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:378)

at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.InterruptedException

at java.lang.Object.wait(Native Method)

at java.lang.Thread.join(Thread.java:1257)

at
org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:703)

... 9 more


Thanks

Yubraj Singh


Re: Flink 1.5.4 -- issues w/ TaskManager connecting to ResourceManager

2018-09-23 Thread alex
We started to see same errors after upgrading to flink 1.6.0 from 1.4.2. We
have one JM and 5 TM on kubernetes. JM is running on HA mode. Taskmanagers
sometimes are loosing connection to JM and having following error like you
have.

*2018-09-19 12:36:40,687 INFO 
org.apache.flink.runtime.taskexecutor.TaskExecutor- Could not
resolve ResourceManager address
akka.tcp://flink@flink-jobmanager:50002/user/resourcemanager, retrying in
1 ms: Ask timed out on
[ActorSelection[Anchor(akka.tcp://flink@flink-jobmanager:50002/),
Path(/user/resourcemanager)]] after [1 ms]. Sender[null] sent message of
type "akka.actor.Identify"..*

When TM started to have "Could not resolve ResourceManager", it cannot
resolve itself until I restart the TM pod.

*Here is the content of our flink-conf.yaml:*
blob.server.port: 6124
jobmanager.rpc.address: flink-jobmanager
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 4096
jobmanager.web.history: 20
jobmanager.archive.fs.dir: s3://our_path
taskmanager.rpc.port: 6121
taskmanager.heap.mb: 16384
taskmanager.numberOfTaskSlots: 10
taskmanager.log.path: /opt/flink/log/output.log
web.log.path: /opt/flink/log/output.log
state.checkpoints.num-retained: 3
metrics.reporters: prom
metrics.reporter.prom.class:
org.apache.flink.metrics.prometheus.PrometheusReporter

high-availability: zookeeper
high-availability.jobmanager.port: 50002
high-availability.zookeeper.quorum: zookeeper_instance_list
high-availability.zookeeper.path.root: /flink
high-availability.cluster-id: profileservice
high-availability.storageDir: s3://our_path

Any help will be greatly appreciated!



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: error closing kafka

2018-09-23 Thread miki haiat
What are you trying to do , can you share some code ?
This is the reason for the exeption
Proceeding to force close the producer since pending requests could not be
completed within timeout 9223372036854775807 ms.



On Mon, 24 Sep 2018, 9:23 yuvraj singh, <19yuvrajsing...@gmail.com> wrote:

> Hi all ,
>
>
> I am getting this error with flink 1.6.0 , please help me .
>
>
>
>
>
>
> 2018-09-23 07:15:08,846 ERROR
> org.apache.kafka.clients.producer.KafkaProducer   -
> Interrupted while joining ioThread
>
> java.lang.InterruptedException
>
> at java.lang.Object.wait(Native Method)
>
> at java.lang.Thread.join(Thread.java:1257)
>
> at
> org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:703)
>
> at
> org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:682)
>
> at
> org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:661)
>
> at
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.close(FlinkKafkaProducerBase.java:319)
>
> at
> org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:43)
>
> at
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:117)
>
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:477)
>
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:378)
>
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>
> at java.lang.Thread.run(Thread.java:745)
>
> 2018-09-23 07:15:08,847 INFO  org.apache.kafka.clients.producer.KafkaProducer
>   - Proceeding to force close the producer since pending
> requests could not be completed within timeout 9223372036854775807 ms.
>
> 2018-09-23 07:15:08,860 ERROR
> org.apache.flink.streaming.runtime.tasks.StreamTask   - Error
> during disposal of stream operator.
>
> org.apache.kafka.common.KafkaException: Failed to close kafka producer
>
> at
> org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:734)
>
> at
> org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:682)
>
> at
> org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:661)
>
> at
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.close(FlinkKafkaProducerBase.java:319)
>
> at
> org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:43)
>
> at
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:117)
>
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:477)
>
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:378)
>
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: java.lang.InterruptedException
>
> at java.lang.Object.wait(Native Method)
>
> at java.lang.Thread.join(Thread.java:1257)
>
> at
> org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:703)
>
> ... 9 more
>
>
> Thanks
>
> Yubraj Singh
>