How to fetch kafka Message have [KEY,VALUE] pair

2016-04-21 Thread prateek arora
Hi

I am new for Apache Flink and start  using Flink version 1.0.1

In my scenario,   kafka message have key value pair [String,Array[Byte]] .

I tried to use FlinkKafkaConsumer08 to fetch data but i dont know how to
write  DeserializationSchema for that.

val stream : DataStream[(String,Array[Byte])]  = env.addSource(new
FlinkKafkaConsumer08[(String,Array[Byte])]("a-0", ,
properties))

please help me to solve this problem .

Regards
Prateek


Re: Getting java.lang.Exception when try to fetch data from Kafka

2016-04-26 Thread prateek arora
Hi Robert ,

Hi

I have java program to send data into kafka topic. below is code for this :

private Producer producer = null

Serializer keySerializer = new StringSerializer();
Serializer valueSerializer = new ByteArraySerializer();
producer = new KafkaProducer(props, keySerializer,
valueSerializer);

ProducerRecord imageRecord;
imageRecord = new ProducerRecord(streamInfo.topic,
Integer.toString(messageKey), imageData);

producer.send(imageRecord);


then trying to fetch data in Apache flink .

Regards
Prateek

On Mon, Apr 25, 2016 at 2:42 AM, Robert Metzger  wrote:

> Hi Prateek,
>
> were the messages written to the Kafka topic by Flink, using the
> TypeInformationKeyValueSerializationSchema ? If not, maybe the Flink
> deserializers expect a different data format of the messages in the topic.
>
> How are the messages written into the topic?
>
>
> On Fri, Apr 22, 2016 at 10:21 PM, prateekarora  > wrote:
>
>>
>> Hi
>>
>> I am sending data using kafkaProducer API
>>
>>imageRecord = new ProducerRecord> byte[]>(topic,messageKey, imageData);
>> producer.send(imageRecord);
>>
>>
>> And in flink program  try to fect data using FlinkKafkaConsumer08 . below
>> are the sample code .
>>
>> def main(args: Array[String]) {
>>   val env = StreamExecutionEnvironment.getExecutionEnvironment
>>   val properties = new Properties()
>>   properties.setProperty("bootstrap.servers", ":9092")
>>   properties.setProperty("zookeeper.connect", ":2181")
>>   properties.setProperty("group.id", "test")
>>
>>   val readSchema = new
>>
>> TypeInformationKeyValueSerializationSchema[String,Array[Byte]](classOf[String],classOf[Array[Byte]],
>>
>> env.getConfig).asInstanceOf[KeyedDeserializationSchema[(String,Array[Byte])]]
>>
>>   val stream : DataStream[(String,Array[Byte])]  =
>> env.addSource(new
>> FlinkKafkaConsumer08[(String,Array[Byte])]("a-0",readSchema, properties))
>>
>>   stream.print
>>   env.execute("Flink Kafka Example")
>>   }
>>
>>
>> but getting  below error :
>>
>> 16/04/22 13:43:39 INFO ExecutionGraph: Source: Custom Source -> Sink:
>> Unnamed (1/4) (d7a151560f6eabdc587a23dc0975cb84) switched from RUNNING to
>> FAILED
>> 16/04/22 13:43:39 INFO ExecutionGraph: Source: Custom Source -> Sink:
>> Unnamed (2/4) (d43754a27e402ed5b02a73d1c9aa3125) switched from RUNNING to
>> CANCELING
>>
>> java.lang.Exception
>> at
>>
>> org.apache.flink.streaming.connectors.kafka.internals.LegacyFetcher.run(LegacyFetcher.java:222)
>> at
>>
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer08.run(FlinkKafkaConsumer08.java:316)
>> at
>>
>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:78)
>> at
>>
>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:56)
>> at
>>
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:225)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.io.EOFException
>> at
>>
>> org.apache.flink.runtime.util.DataInputDeserializer.readUnsignedByte(DataInputDeserializer.java:298)
>> at org.apache.flink.types.StringValue.readString(StringValue.java:771)
>> at
>>
>> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:69)
>> at
>>
>> org.apache.flink.api.common.typeutils.base.StringSerializer.deserialize(StringSerializer.java:28)
>> at
>>
>> org.apache.flink.streaming.util.serialization.TypeInformationKeyValueSerializationSchema.deserialize(TypeInformationKeyValueSerializationSchema.java:105)
>> at
>>
>> org.apache.flink.streaming.util.serialization.TypeInformationKeyValueSerializationSchema.deserialize(TypeInformationKeyValueSerializationSchema.java:39)
>> at
>>
>> org.apache.flink.streaming.connectors.kafka.internals.LegacyFetcher$SimpleConsumerThread.run(LegacyFetcher.java:657)
>>
>>
>> Regards
>> Prateek
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Getting-java-lang-Exception-when-try-to-fetch-data-from-Kafka-tp6365.html
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive at Nabble.com.
>>
>
>


Re: How to measure Flink performance

2016-05-09 Thread prateek arora
Hi
Thanks for the answer , then how can i measure the performance of flink ?

i want to run my application with both spark and flink . and want to
measure the performance . so i can check how fast flink process my data as
compare to spark.

Regards
prateek

On Mon, May 9, 2016 at 2:17 AM, Ufuk Celebi  wrote:

> Hey Prateek,
>
> On Fri, May 6, 2016 at 6:40 PM, prateekarora 
> wrote:
> > I have below information from spark . do i can get similar information
> from
> > Flink also ? if yes then how can i get that.
>
> You can get GC time via the task manager overview.
>
> The other metrics don't necessarily translate to Flink as Flink is not
> executing your streaming program as mini-batches, but your program is
> executed with continuous (long lived) operators.
>
> This means for example that shuffles are continiously exchanging data
> and you can't easily look at "how long the shuffle took". Also, the
> scheduler delay and serialization times are not that interesting for
> Flink as the cost of this is amortized over one long-running job (e.g.
> because zero if your job is running long enough ;)) and you don't
> schedule and serialize the tasks multiple times.
>
> – Ufuk
>