Re: How to log/analyze the consumer lag in kafka streaming application

2017-01-26 Thread Sachin Mittal
Hi,
Thanks for sharing the info.

I am reading this document for more understanding:
http://kafka.apache.org/documentation.html#monitoring

Is there any special way I need to start my kafka cluster or streams
application (or configure them) to report these metrics.

I suppose both cluster and streams application report separate metrics. I
mean that to collect streams metrics I need to connect to the jmx port on
machine where my streams is running right?

One issue I see is that the machines where both cluster and streams
application are running are not accessible from outside where I can run any
UI based application like jconsole to report on these metrics.

So what are other possible option. can I log the metrics values to a log
file. or if can I enable logging in general. If yes where do I place my
log4j.properties. I tried making it part of the jar which has my main class
but I don't see any logs getting generated.

Thanks
Sachin



On Fri, Jan 27, 2017 at 6:48 AM, Matthias J. Sax 
wrote:

> You should check out Kafka Streams Metrics (for upcoming 0.10.2 they are
> even more detailed).
>
> There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
> the same way as for consumer/producer metric that are documented.
>
>
> -Matthias
>
> On 1/24/17 10:38 PM, Sachin Mittal wrote:
> > Hi All,
> > I am running a kafka streaming application with a simple pipeline of:
> > source topic -> group -> aggregate by key -> for each > save to a sink.
> >
> > I source topic gets message at rate of 5000 - 1 messages per second.
> > During peak load we see the delay reaching to 3 million messages.
> >
> > So I need to figure out where delay might be happening.
> >
> > 1.  Is there any mechanism in kafka streams to log time spent within each
> > pipeline stage.
> >
> > 2.  Also if I want to turn on custom logging to log some times how can I
> do
> > the same.
> >
> > I have a log4j.properties and I am packaging it inside a jar which has
> the
> > main class.
> > I place that jar in libs folder of kafka installation.
> >
> > However I see no logs generated under logs folder.
> >
> > So where are we suppose to add the log4j.properties.
> >
> > Thanks
> > Sachin
> >
>
>


Re: How to log/analyze the consumer lag in kafka streaming application

2017-01-26 Thread Matthias J. Sax
You should check out Kafka Streams Metrics (for upcoming 0.10.2 they are
even more detailed).

There is not a lot of documentation for 0.10.0 or 0.10.1, but it work
the same way as for consumer/producer metric that are documented.


-Matthias

On 1/24/17 10:38 PM, Sachin Mittal wrote:
> Hi All,
> I am running a kafka streaming application with a simple pipeline of:
> source topic -> group -> aggregate by key -> for each > save to a sink.
> 
> I source topic gets message at rate of 5000 - 1 messages per second.
> During peak load we see the delay reaching to 3 million messages.
> 
> So I need to figure out where delay might be happening.
> 
> 1.  Is there any mechanism in kafka streams to log time spent within each
> pipeline stage.
> 
> 2.  Also if I want to turn on custom logging to log some times how can I do
> the same.
> 
> I have a log4j.properties and I am packaging it inside a jar which has the
> main class.
> I place that jar in libs folder of kafka installation.
> 
> However I see no logs generated under logs folder.
> 
> So where are we suppose to add the log4j.properties.
> 
> Thanks
> Sachin
> 



signature.asc
Description: OpenPGP digital signature


Upgrade questions

2017-01-26 Thread Fernando Vega
I have a few questions regarding an upgrade that Im attempting to perform.

- Currently we are running version 0.8.2_beta.
- We are trying to upgrade to 10.1.1
- Our setup uses the following path
   /server/kafka
   where kafka is a symlink to kafka-{version}
- I attempt to perform the upgrade as you guys specify in the documentation:
http://kafka.apache.org/0100/documentation.html#upgrade

However when I attempt this my server.properties file that was in
kafka-0.8.2_beta was gone and a new file was under kafka-0.10.1.1.

I started the server without noticing this at first and it started normally.
When I realize about it I stop it and copy the server.properties file from
another host, running the old version into this server in the new directory
and adding the 2 lines mentioned in the instructions above as well as
changing its broker id of course, when I do this the server does not start.

So my questions are:

Are any of the config files different from one version to another?
Does the options from one version to another are the same?

This is a sample of the file that the old cluster uses and I will like to
know if this options are still valid for the new version:

## See http://kafka.apache.org/documentation.html#brokerconfigs for default
values.

# The id of the broker. This must be set to a unique integer for each
broker.
broker.id=6

# The port the socket server listens on
port=9092

# A comma seperated list of directories under which to store log files
log.dirs=/data/1/kafka/datalog,/data/2/kafka/datalog,/data/3/kafka/datalog,/data/4/kafka/datalog,/data/5/kafka/datalog,/data/6/kafka/datalog,/data/7/kafka/datalog,/data/8/kafka/datalog

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.

zookeeper.connect=dwh-pipeline001:2181,dwh-pipeline002:2181,dwh-pipeline003:2181,dwh-pipeline004:2181,dwh-pipeline005:2181/kafka/data

# Additional configuration options may follow here
auto.leader.rebalance.enable=true
delete.topic.enable=true
log.retention.size=200
socket.receive.buffer.bytes=1048576
socket.send.buffer.bytes=1048576
default.replication.factor=2
auto.create.topics.enable=true
num.partitions=100
num.network.threads=3
num.io.threads=40
log.retention.hours=24
log.roll.hours=1
num.replica.fetchers=8
zookeeper.connection.timeout.ms=3
zookeeper.session.timeout.ms=3

Also What are the steps or ways I can check the update work fine and that
Im running x or y version or to determine things are working fine

Thank you

[image: Turn] 

*Fernando Vega*
Sr. Operations Engineer
*cell* (415) 810-0242
901 Marshall St, Suite 200, Redwood City, CA 94063


turn.com    |   @TurnPlatform


This message is Turn Confidential, except for information included that is
already available to the public. If this message was sent to you
accidentally, please delete it.


Re: Kafka Multiple Consumer Group for Same Topic

2017-01-26 Thread Joris Meijer
Hi Senthil,

You can just try it yourself, using kafka-consumer-perf-test.sh bundled
with Apache Kafka. You can start 2 in parallel with different groups and
see what happens if you tune some parameters.

Joris

On Wed, Jan 25, 2017, 08:38 Senthil Kumar  wrote:

> Thanks Sharninder!
>
> Adding dev group to know if they done some benchmarking test on  Single
> Consumer Group Vs Multiple Consumer Grp on Same Topic.
>
> Cheers,
> Senthil
>
>
> On Jan 24, 2017 10:48 PM, "Sharninder Khera"  wrote:
>
> I don't have benchmarks but multiple consumer groups are possible. For
> Kafka the performance should be similar or close to as having multiple
> consumers using a single group.
>
>
> _
> From: Senthil Kumar 
> Sent: Tuesday, January 24, 2017 10:38 PM
> Subject: Kafka Multiple Consumer Group for Same Topic
> To:  
> Cc:  
>
>
> Hi Team ,  Sorry if the same question asked already in this group !
>
> Say we have topic => ad_events ..  I want to read events from ad_events
> topic and send it to two different systems... This can be achieved by
> creating two  Consume Groups..
>
> Example :  Consumer Group SYS1 with 10 threads
>   Consume Group SYS2 with 10 threads
>
> Would like to know having two different Consumer Groups will impact
> performance of Kafka Read ??  Also want to see the *Benchmarking Result*(
> Numbers ) of   Single Topic Read with *One Consumer Group* Vs Single Topic
> with *Two/Three Consumer Group*..
>
>
> Cheers,
> Senthil
>


Re: Reg: Kafka ACLS

2017-01-26 Thread Joris Meijer
Why would you want to use the benefit of ACLs without mutual SSL? I think
that as soon as you decided you need ACLs, you also should add encryption.
It's relatively easy to configure.

On Thu, Jan 26, 2017, 05:35 Manikumar  wrote:

> Yes, we can use Kafka ACL's with SASL/PLAIN mechanism.
>
> On Thu, Jan 26, 2017 at 2:38 AM, BigData dev 
> wrote:
>
> > Hi,
> > I have a question, can we use Kafka ACL's with only SASL/PLAIN mechanism.
> > Because after I enabled, still I am able to produce/consume from topics.
> >
> > And one more observation is in kafka-_jaas.conf, there is no client
> > section, will get an WARN as below, as we dont have this kind of
> mechanisim
> > with zookeeper.  Just want to confirm is this expected?
> >
> > *WARN SASL configuration failed:
> javax.security.auth.login.LoginException:
> > No JAAS configuration section named 'Client' was found in specified JAAS
> > configuration file: '/usr/iop/current/kafka-broker/conf/kafka_jaas.conf'.
> > Will continue connection to Zookeeper server without SASL authentication,
> > if Zookeeper server allows it. (org.apache.zookeeper.ClientCnxn)*
> >
> > KafkaClient {
> >
> > org.apache.kafka.common.security.plain.PlainLoginModule required
> >
> > username="alice"
> >
> > password="alice-secret";
> >
> > };
> >
> >
> > KafkaServer {
> >
> > org.apache.kafka.common.security.plain.PlainLoginModule required
> >
> > username="admin"
> >
> > password="admin-secret"
> >
> > user_admin="admin-secret"
> >
> > user_alice="alice-secret";
> >
> > };
> >
> >
> > I see recommended is SASL/PLAIN with SSL, just can we use only SASL/PLAIN
> > mechanisim with ACLS?
> >
> > Thanks
> >
>


Kafka JDBC connector vs Sqoop

2017-01-26 Thread Buntu Dev
I'm looking for ways to bulk/incremental import from MySQL database to
HDFS. Currently I got Sqoop that does the bulk import creating a Hive
table.

Wanted to know the pros/cons of using JDBC connector instead of Sqoop and
are there any MySQL config changes expected (like binlog configuration in
the case of CDC connectors) to import insert/alter/delete statement.


Thanks!


Fwd: [DISCUSS] KIP-114: KTable materialization and improved semantics

2017-01-26 Thread Eno Thereska
Forwarding this thread to the users list too in case people would like to 
comment. It is also on the dev list.

Thanks
Eno

> Begin forwarded message:
> 
> From: "Matthias J. Sax" 
> Subject: Re: [DISCUSS] KIP-114: KTable materialization and improved semantics
> Date: 24 January 2017 at 19:30:10 GMT
> To: d...@kafka.apache.org
> Reply-To: d...@kafka.apache.org
> 
> That not what I meant by "huge impact".
> 
> I refer to the actions related to materialize a KTable: creating a
> RocksDB store and a changelog topic -- users should be aware about
> runtime implication and this is better expressed by an explicit method
> call, rather than implicitly triggered by using a different overload of
> a method.
> 
> 
> -Matthias
> 
> On 1/24/17 1:35 AM, Damian Guy wrote:
>> I think your definition of a huge impact and mine are rather different ;-P
>> Overloading a few methods  is not really a huge impact IMO. It is also a
>> sacrifice worth making for readability, usability of the API.
>> 
>> On Mon, 23 Jan 2017 at 17:55 Matthias J. Sax  wrote:
>> 
>>> I understand your argument, but do not agree with it.
>>> 
>>> Your first version (even if the "flow" is not as nice) is more explicit
>>> than the second version. Adding a stateStoreName parameter is quite
>>> implicit but has a huge impact -- thus, I prefer the rather more verbose
>>> but explicit version.
>>> 
>>> 
>>> -Matthias
>>> 
>>> On 1/23/17 1:39 AM, Damian Guy wrote:
 I'm not a fan of materialize. I think it interrupts the flow, i.e,
 
 table.mapValue(..).materialize().join(..).materialize()
 compared to:
 table.mapValues(..).join(..)
 
 I know which one i prefer.
 My preference is stil to provide overloaded methods where people can
 specify the store names if they want, otherwise we just generate them.
 
 On Mon, 23 Jan 2017 at 05:30 Matthias J. Sax 
>>> wrote:
 
> Hi,
> 
> thanks for the KIP Eno! Here are my 2 cents:
> 
> 1) I like Guozhang's proposal about removing store name from all KTable
> methods and generate internal names (however, I would do this as
> overloads). Furthermore, I would not force users to call .materialize()
> if they want to query a store, but add one more method .stateStoreName()
> that returns the store name if the KTable is materialized. Thus, also
> .materialize() must not necessarily have a parameter storeName (ie, we
> should have some overloads here).
> 
> I would also not allow to provide a null store name (to indicate no
> materialization if not necessary) but throw an exception.
> 
> This yields some simplification (see below).
> 
> 
> 2) I also like Guozhang's proposal about KStream#toTable()
> 
> 
> 3)
>> 
>>>  3. What will happen when you call materialize on KTable that is
> already
>>>  materialized? Will it create another StateStore (providing the name
>>> is
>>>  different), throw an Exception?
>> 
>> Currently an exception is thrown, but see below.
>> 
>> 
> 
> If we follow approach (1) from Guozhang, there is no need to worry about
> a second materialization and also no exception must be throws. A call to
> .materialize() basically sets a "materialized flag" (ie, idempotent
> operation) and sets a new name.
> 
> 
> 4)
>>> Rename toStream() to toKStream() for consistency.
>> 
>> Not sure whether that is really required. We also use
>> `KStreamBuilder#stream()` and `KStreamBuilder#table()`, for example,
>>> and
>> don't care about the "K" prefix.
> 
> Eno's reply:
>> I think changing it to `toKStream` would make it absolutely clear what
> we are converting it to.
>> 
>> I'd say we should probably change the KStreamBuilder methods (but not
>>> in
> this KIP).
> 
> I would keep #toStream(). (see below)
> 
> 
> 5) We should not remove any methods but only deprecate them.
> 
> 
> 
> A general note:
> 
> I do not understand your comments "Rejected Alternatives". You say "Have
> the KTable be the materialized view" was rejected. But your KIP actually
> does exactly this -- the changelog abstraction of KTable is secondary
> after those changes and the "view" abstraction is what a KTable is. And
> just to be clear, I like this a lot:
> 
> - it aligns with the name KTable
> - is aligns with stream-table-duality
> - it aligns with IQ
> 
> I would say that a KTable is a "view abstraction" (as materialization is
> optional).
> 
> 
> 
> -Matthias
> 
> 
> 
> 
> On 1/22/17 5:05 PM, Guozhang Wang wrote:
>> Thanks for the KIP Eno, I have a few meta comments and a few detailed
>> comments:
>> 
>> 1. I like the materialize() function in general, but I would like to
>>> see
>> how 

broker randomly shuts down

2017-01-26 Thread Chriss Vincelette
I know I'm reviving an old thread but did the original poster ever find the 
cause of this issue and figure out what the fix was?

I am running a cluster of 18 Kafka .9 brokers and three of them are having 
behaving exactly this way once a week.

Pretty scary because they are doing a full resign as ISR, dropping all of their 
partitions, and stopping. Then requiring a full rebuild when restarted.

No smoking gun errors in the log. Just a clean shutdown triggered for no 
apparent reason.

using kafka log compaction withour key

2017-01-26 Thread Samy CHBINOU

Hello,

Is is possible to use log compaction without key? I think in that case 
buffer will contain only one line of data value? Is that correct?


thanks