Re: Application mode deployment through API call

2022-05-24 Thread Vikash Dat
Similar to agent Biao, Application mode is okay if you only have a single
app, but when running multiple apps session mode is better for control. In
my experience, the CLIFrontend is not as robust as the REST API, or you
will end up having to rebuild a very similar Rest API. For the meta space
issue, have you tried adding shared libraries to the flink lib folder?

On Mon, May 23, 2022 at 23:31 Shengkai Fang  wrote:

> Hi, all.
>
> > is there any plan in the Flink community to provide an easier way of
> deploying Flink with application mode on YARN
>
> Yes. Jark has already opened a ticket about how to use the sql client to
> submit the SQL in application mode[1]. What's more, in FLIP-222 we are able
> to manage the jobs in SQL, which will list all submitted jobs and their web
> UI[2].
>
> [1] https://issues.apache.org/jira/browse/FLINK-26541
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client
>
>
>


Re: CICD

2021-01-03 Thread Vikash Dat
Could you not use the JM web address to utilize the rest api? You can
start/stop/save point/restore + upload new jars via the rest api. While I
did not run on ECS( ran on EMR) I was able to use the rest api to do
deployments.

On Sun, Jan 3, 2021 at 19:09 Navneeth Krishnan 
wrote:

> Hi All,
>
> Currently we are using flink in session cluster mode and we manually
> deploy the jobs i.e. through the web UI. We use AWS ECS for running the
> docker container with 2 services definitions, one for JM and other for TM.
> How is everyone managing the CICD process? Is there a better way to run a
> job in job cluster mode and use jenkins to perform CICD?
>
> Any pointers on how this is being done would really help and greatly
> appreciated.
>
> Thanks,
> Navneeth
>


Flink Parquet Streaming FileSink with scala case class with optional fields error

2020-08-11 Thread Vikash Dat
I have defined a streaming file sink for parquet to store my scala case
class.

StreamingFileSink

  .*forBulkFormat(*

new Path*(*appArgs.datalakeBucket*)*,

ParquetAvroWriters

  .*forReflectRecord(classOf[*Log*])*

*  )*

  .withBucketAssigner*(*new TransactionLogHiveBucketAssigner*())*

  .build*()*


where my class class is

Log(

   level: String,

time_stamp: Option[Long] = None

)


When Flink tries to write a specific instance to parquet


Log("info",Some(159697595))


it throws the following error:


org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema
with an empty group: required group time_stamp {
}
at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:27)
at org.apache.parquet.schema.GroupType.accept(GroupType.java:255)
at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:31)
at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:37)
at org.apache.parquet.schema.MessageType.accept(MessageType.java:55)
at org.apache.parquet.schema.TypeUtil.checkValidWriteSchema(TypeUtil
.java:23)
at org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter
.java:280)
at org.apache.parquet.hadoop.ParquetWriter.(ParquetWriter.java:283
)
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter
.java:564)
at org.apache.flink.formats.parquet.avro.ParquetAvroWriters
.createAvroParquetWriter(ParquetAvroWriters.java:87)
at org.apache.flink.formats.parquet.avro.ParquetAvroWriters
.lambda$forReflectRecord$3c375096$1(ParquetAvroWriters.java:73)
at org.apache.flink.formats.parquet.ParquetWriterFactory.create(
ParquetWriterFactory.java:57)
at org.apache.flink.streaming.api.functions.sink.filesystem.
BulkPartWriter$Factory.openNew(BulkPartWriter.java:103)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket
.rollPartFile(Bucket.java:222)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket
.write(Bucket.java:212)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets
.onElement(Buckets.java:274)
at org.apache.flink.streaming.api.functions.sink.filesystem.
StreamingFileSink.invoke(StreamingFileSink.java:445)
at org.apache.flink.streaming.api.operators.StreamSink.processElement(
StreamSink.java:56)
at org.apache.flink.streaming.runtime.tasks.
OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:641)
at org.apache.flink.streaming.runtime.tasks.
OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:616)
at org.apache.flink.streaming.runtime.tasks.
OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:596)
at org.apache.flink.streaming.api.operators.
AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:
730)
at org.apache.flink.streaming.api.operators.
AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:
708)
at org.apache.flink.streaming.api.operators.StreamMap.processElement(
StreamMap.java:41)
at org.apache.flink.streaming.runtime.tasks.
OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:641)
at org.apache.flink.streaming.runtime.tasks.
OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:616)
at org.apache.flink.streaming.runtime.tasks.
OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:596)
at org.apache.flink.streaming.runtime.tasks.
OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:711)
at org.apache.flink.streaming.runtime.tasks.
OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:664)
at org.apache.flink.streaming.api.operators.
AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:
730)
at org.apache.flink.streaming.api.operators.
AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:
708)
at org.apache.flink.streaming.api.operators.TimestampedCollector.collect
(TimestampedCollector.java:53)
at org.apache.flink.streaming.api.functions.windowing.
PassThroughWindowFunction.apply(PassThroughWindowFunction.java:36)
at org.apache.flink.streaming.runtime.operators.windowing.functions.
InternalSingleValueWindowFunction.process(InternalSingleValueWindowFunction
.java:46)
at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator
.emitWindowContents(WindowOperator.java:549)
at org.apache.flink.streaming.runtime.operators.windowing.WindowOperator
.processElement(WindowOperator.java:373)
at org.apache.flink.streaming.runtime.tasks.
OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask
.java:173)
at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput
.processElement(StreamTaskNetworkInput.java:151)
at org.apache.flink.streaming.runtime.io.StreamTaskNetworkInput
.emitNext(StreamTaskNetworkInput.java:128)
at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor
.processInput(StreamOneInputProcessor.java:69)
at 

Re: Flink Kafka EXACTLY_ONCE causing KafkaException ByteArraySerializer is not an instance of Serializer

2020-07-31 Thread Vikash Dat
Thanks for the reply. I am currently using 1.10 but also saw it happens in
1.10.1 when experimenting. I have not tried 1.11 since EMR only has up to
1.10 at the moment. Are there any known work arounds?

On Fri, Jul 31, 2020 at 02:42 Qingsheng Ren  wrote:

> Hi Vikash,
>
> It's a bug about classloader used in `abortTransaction()` method in
> `FlinkKafkaProducer`, Flink version 1.10.0. I think it has been fixed in
> 1.10.1 and 1.11 according to FLINK-16262. Are you using Flink version
> 1.10.0?
>
>
> Vikash Dat  于2020年7月30日周四 下午9:26写道:
>
>> Has anyone had success with using exactly_once in a kafka producer in
>> flink?
>> As of right now I don't think the code shown in the docs
>> (
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-producer
>> )
>> actually works.
>>
>>
>>
>> --
>> Sent from:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>>
>
>
> --
> Best Regards,
>
> *Qingsheng Ren*
>
> Electrical and Computer Engineering
> Carnegie Mellon University
>
> Email: renqs...@gmail.com
>


Re: Flink Kafka EXACTLY_ONCE causing KafkaException ByteArraySerializer is not an instance of Serializer

2020-07-30 Thread Vikash Dat
Has anyone had success with using exactly_once in a kafka producer in flink?
As of right now I don't think the code shown in the docs
(https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html#kafka-producer)
actually works.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Flink Kafka EXACTLY_ONCE causing KafkaException ByteArraySerializer is not an instance of Serializer

2020-07-29 Thread Vikash Dat
I'm using Flink 1.10 and Kafka (AWS MSK) 2.2 and trying to do a simple app
that consumes from one kafka topic and produces events into another topic.
I would like to utilize the exactly_once semantic, however, I am
experiencing the following error:

org.apache.kafka.common.KafkaException:
org.apache.kafka.common.KafkaException: Failed to construct kafka producer
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:593)
at
java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:677)
at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:735)
at
java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:159)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:173)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at
java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:485)
at
java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:650)
at
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.abortTransactions(FlinkKafkaProducer.java:1099)
at
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.initializeState(FlinkKafkaProducer.java:1036)
at
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:284)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeStateAndOpen(StreamTask.java:1006)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$0(StreamTask.java:454)
at
org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:94)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:449)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:461)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:707)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:532)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.KafkaException: Failed to construct
kafka producer
at
org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:432)
at
org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:298)
at
org.apache.flink.streaming.connectors.kafka.internal.FlinkKafkaInternalProducer.(FlinkKafkaInternalProducer.java:76)
at
org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.lambda$abortTransactions$2(FlinkKafkaProducer.java:1107)
at
java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1556)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
at java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:290)
at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at
java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at
java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Caused by: org.apache.kafka.common.KafkaException: class org.apache.kafka.
common.serialization.ByteArraySerializer is not an instance of
org.apache.kafka.common.serialization.Serializer
at
org.apache.kafka.common.config.AbstractConfig.getConfiguredInstance(AbstractConfig.java:374)
at
org.apache.kafka.common.config.AbstractConfig.getConfiguredInstance(AbstractConfig.java:392)
at
org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:359)
... 12 more

My producer is defined as

new FlinkKafkaProducer[String](
  appArgs.authTxnTopic, // target topic
  new KeyedSerializationSchemaWrapper[String](
new SimpleStringSchema()
  ), // serialization schema
  kafkaProdProps, // producer config,
  FlinkKafkaProducer.Semantic.EXACTLY_ONCE
)


if I remove the exactly_once semantic as below, it works.

new FlinkKafkaProducer[String](
  appArgs.authTxnTopic, // target topic
  new KeyedSerializationSchemaWrapper[String](
new SimpleStringSchema()
  

Re: Flink on yarn : yarn-session understanding

2020-06-16 Thread Vikash Dat
yarn will assign a random port when flink is deployed. To get the port you
need to do a yarn application -list and see the tracking url assigned to
your flink cluster. The port in that url will be the port you need to use
for the rest api.

On Tue, Jun 16, 2020 at 08:49 aj  wrote:

> Ok, thanks for the clarification on yarn session.
>
>  I am trying to connect to job manager on 8081 but it's not connecting.
>
> [image: image.png]
>
>
> So this is the address shown on my Flink job UI and i am trying to connect
> rest address on 8081 but its refusing connection.
>
> On Tue, Jun 9, 2020 at 1:03 PM Andrey Zagrebin 
> wrote:
>
>> Hi Anuj,
>>
>> Afaik, the REST API should work for both modes. What is the issue? Maybe,
>> some network problem to connect to YARN application master?
>>
>> Best,
>> Andrey
>>
>> On Mon, Jun 8, 2020 at 4:39 PM aj  wrote:
>>
>>> I am running some stream jobs that are long-running always. I am
>>> currently submitting each job as a standalone job on yarn.
>>>
>>> 1. I need to understand what is the advantage of using yarn-session and
>>> when should I use that.
>>> 2. Also, I am not able to access rest API services is it because I am
>>> running as standalone job over yarn. Is REST API works only in yarn-session?
>>>
>>>
>>> --
>>> Thanks & Regards,
>>> Anuj Jain
>>>
>>>
>>> 
>>>
>>
>
> --
> Thanks & Regards,
> Anuj Jain
> Mob. : +91- 8588817877
> Skype : anuj.jain07
> 
>
>
> 
>