Logcompaction with Kafka and how to send nulls or empty payloads

2018-05-31 Thread Guillermo Ortiz
I don't understand how logcompaction works.

I have create an configure a topic and consume from this topic

kafka-topics --create --zookeeper localhost:2181 --replication-factor
1 --partitions 1 --topic COMPACTION10
kafka-topics --alter --zookeeper localhost:2181 --config
min.cleanable.dirty.ratio=0.01 --config cleanup.policy=compact
--config segment.ms=100 --config delete.retention.ms=100 --config
segment.bytes=1000 --topic  COMPACTION10

Later, I have created an scala program to insert (k,V) from 1 to 100 with
the same K and V.

val data_amount = 100
val topic = "COMPACTION10"
val kafkaConfiguration = new Properties
kafkaConfiguration.put("bootstrap.servers", "localhost:9092")
kafkaConfiguration.put("key.serializer",
Class.forName("org.apache.kafka.common.serialization.StringSerializer"))
kafkaConfiguration.put("value.serializer",
Class.forName("org.apache.kafka.common.serialization.StringSerializer"))

val producer = new KafkaProducer[String, String](kafkaConfiguration)

for (id <- 1 to data_amount){
println( id )
val record = new ProducerRecord(topic, id.toString, id.toString)
producer.send(record)
}
producer.close()

After executing this, I got something like this:

1-1
2-2
3-3
4-4
5-5
6-6
7-7
8-8
9-9
10-10
11-11
12-12
13-13
14-14
15-15
16-16
17-17
18-18
19-19
20-20
21-21
22-22
23-23
24-24
25-25
26-26
27-27
57-57
58-58
59-59
60-60
61-61
62-62
63-63
64-64
65-65
66-66
67-67
68-68
Processed a total of 39 messages

Why do I get only 39 messages and don't 100 messages?? All they have a
different key, so, it shouldn't have compaction.

If I send a new 100 values (1 to 100), I get about 50 messages with some
new values.

Later, I have tried to send values with an empty value to do a delete
compaction, but it's not possible to send a None or null value in the
producer. If I send "" value, they are recognized like new values:

1-1
2-2
3-3
4-4
5-5
6-6
7-7
8-8
9-9
10-10
11-11
12-12
13-13
14-14
15-15
16-16
17-17
18-18
19-19
20-20
21-21
22-22
23-23
24-24
25-25
26-26
27-27
57-57
58-58
59-59
60-60
61-61
62-62
63-63
64-64
65-65
66-66
67-67
68-68
1-
2-
3-
4-
5-
6-
7-
8-
9-
10-
11-
12-
13-
14-
15-
16-
17-
18-
19-
20-
21-
22-
23-
24-
25-
26-
27-
28-
29-
30-
31-
32-
33-
34-
35-
36-
37-
Processed a total of 76 messages

How could I send empty values to get a delete compaction of some keys?

I have executed with these values with similar results:

I executed with these new parameters with similar results.

kafka-topics --create --zookeeper localhost:2181 --replication-factor
1 --partitions 1 --topic COMPACTION11

kafka-topics --alter --zookeeper localhost:2181 --config
min.cleanable.dirty.ratio=0.01 --config cleanup.policy=compact
--config segment.ms=1000 --config delete.retention.ms=1 --config
segment.bytes=1000 --topic  COMPACTION11


Re: Kafka Connect in different nodes than Kafka.

2017-02-01 Thread Guillermo Ortiz
I was asking that exactly. Thank you ;)

2017-02-01 20:39 GMT+01:00 Hans Jespersen <h...@confluent.io>:

> If you are asking if technically does Kafka Connect need the entire Apache
> Kafka distribution to run then, then the answer is no, it does not because
> Connectors just remotely connect to Kafka Brokers on separate machines.
>
> If you are asking if there is a separate distribution for a “connect node”
> that pre-packages only the jars and scripts needed to run Kafka Connect and
> not any of the other Kafka components then the answer is unfortunately no
> at this time, you need to download and install the entire Kafka
> distribution in order to get the bits needed to run Kafka Connect.
>
> -hans
>
>
>
>
> > On Feb 1, 2017, at 11:08 AM, Guillermo Ortiz <konstt2...@gmail.com>
> wrote:
> >
> > Is it possible to use Kafka Connect in nodes where isn't Kafka
> installed? I
> > can't see on the documentation anything about installed in others nodes.
>
>


Kafka Connect in different nodes than Kafka.

2017-02-01 Thread Guillermo Ortiz
Is it possible to use Kafka Connect in nodes where isn't Kafka installed? I
can't see on the documentation anything about installed in others nodes.


Kafka Connect in different hosts than Kafka

2017-02-01 Thread Guillermo Ortiz
Hello,

I'm going to use Kafka Connect (with Apache Ignite). I guess that it's
possible to install Kafka-Connect in others machines different than Kafka,
is it recommended? Are you going to lose some data locallity? is it enough
smart Kafka-connect to get this data locallity? how much resources does
Kafka Connect consume?


Number of partitions and disks in a topic

2015-12-01 Thread Guillermo Ortiz
Hello,

I want to size the kafka cluster with just one topic and I'm going to
process the data with Spark and others applications.

If I have six hard drives per node, is it kafka smart enough to deal with
them? I guess that the memory should be very important in this point and
all data is cached in memory. Is it possible to config kafka to use many
directories as HDFS, each one with a different disk?

I'm not sure about the number of partitions either. I have read
http://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
and they talk about number of partitions much higher that I had thought. Is
it normal to have a topic with 1000 partitions? I was thinking about about
two/four partitions per node. is it wrong my thought?

As I'm going to process data with Spark, I could have numberPartitions
equals numberExecutors in Spark as max, always thinking in the future and
sizing higher than that.