Hi folks,
As far as I know, Kafka Stream is a separate process by reading data from
topic, transform, and writing to another topic if needed. In this case, how
this process supports high throughout stream as well as load balance in terms
of message traffic and computing resource for stream
In terms of big files which is quite often in HDFS, does connect task parallel
process the same file like what MR deal with split files? I do not think so. In
this case, Kafka connect implement has no advantages to read single big file
unless you also use mapreduce.
Sent from my iPhone
On Jan
Hi folks,
I try to start the kafka connect in the distribute ways as follows. It has
below error. Standalone mode is fine. It happens on the 3.0.1. and 3.1 version
of confluent kafka. Des anyone know the cause of this error?
Thanks,
Will
security.protocol = PLAINTEXT
Hi folks,
How I can collect Kafka connect metrics from Confluent? Are there any API to
use?
In addition, if one file is very big, can multiple task working on the same
file simultaneously?
Thanks,
Will
target is to get
Flink costume avro data produced by Kafka connect
> On Nov 2, 2016, at 7:36 PM, Will Du <will...@gmail.com> wrote:
>
>
> On Nov 2, 2016, at 7:31 PM, Will Du <will...@gmail.com
> <mailto:will...@gmail.com>> wrote:
>
> Hi folks,
> I
On Nov 2, 2016, at 7:31 PM, Will Du <will...@gmail.com> wrote:
Hi folks,
I am trying to consume avro data from Kafka in Flink. The data is produced by
Kafka connect using AvroConverter. I have created a
AvroDeserializationSchema.java
<https://gist.github.com/d
Hi guys,
I was running a single node broker in a cluster. And when I run the
producer in another cluster, I got connection time out error.
I can ping into port 9092 and other ports on the broker machine from the
producer. I just can't publish any messages. The command I used to run the
producer
Also, I can see the topic "speedx2" being created in the broker, but not
message data is coming through.
On Sun, Nov 29, 2015 at 7:00 PM, Yuheng Du <yuheng.du.h...@gmail.com> wrote:
> Hi guys,
>
> I was running a single node broker in a cluster. And when I run the
>
o connect for publishing. Kafka will
> tell the client about all the other brokers. But best practices state
> including all of them is best.
> -Erik
>
> On 9/14/15, 2:46 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>
> >I am writing a kafka producer applicatio
I am writing a kafka producer application in java. I want the producer to
publish data to a cluster of 6 brokers. Is there a way to specify only the
load balancing node but not all the brokers list?
For example, like in the benchmarking kafka commandssdg:
bin/kafka-run-class.sh
ere was a burst of
> slower messages which caused this behavior, or if it was a consistent
> issue with that node.
> -Erik
>
>
> On 9/9/15, 2:24 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>
> >So are you suggesting that the long delays happened in %
at least in my
> case, one of my brokers is further than the others.
> -Erik
>
> On 9/4/15, 1:06 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>
> >No problem. Thanks for your advice. I think it would be fun to explore. I
> >only know how to program in ja
According to the section 3.1 of the paper "Kafka: a Distributed Messaging
System for Log Processing":
"a message is only exposed to the consumers after it is flushed"?
Is it still true in the current kafka? like the message can only be
available after it is flushed to disk?
Thanks.
When I using 32 partitions, the 4 brokers latency becomes larger than the 8
brokers latency.
So is it always true that using more brokers can give less latency when the
number of partitions is at least the size of the brokers?
Thanks.
On Thu, Sep 3, 2015 at 10:45 PM, Yuheng Du <yuheng.d
roughput first and low latency second. And
> it does a really good job at both.
>
> Disclaimer: I might not like linear algebra, but I do like statistics.
> Let me know if there are topics that need more explanation above that
> aren¹t covered by Gil¹s lecture.
> -Erik
>
> On 9/
Can't read it. Sorry
On Fri, Sep 4, 2015 at 12:08 PM, Roman Shramkov <roman_shram...@epam.com>
wrote:
> Её ай н Анны уйг
>
> sent from a mobile device, please excuse brevity and typos
>
>
> ----Пользователь Yuheng Du написал
>
> According to the s
ts will be
> this slow or faster”, or for values that are high like 99.9%’ile, “0.1% of
> all events will be slower than this”.
> -Erik
>
> On 9/4/15, 12:05 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>
> >Thank you Erik! That's is helpful!
> >
> &
o it might be a while…
> -Erik
>
>
> On 9/4/15, 12:55 PM, "Yuheng Du" <yuheng.du.h...@gmail.com> wrote:
>
> >Thanks for your reply Erik. I am running some more tests according to your
> >suggestions now and I will share with my results here. Is it necessary
I am running a producer latency test. When using 92 producers in 92
physical node publishing to 4 brokers, the latency is slightly lower than
using 8 brokers, I am using 8 partitions for the topic.
I have rerun the test and it gives me the same result, the 4 brokers
scenario still has lower
Also, When I set the target throughput to be 1 records/s, The actual
test results show I got an average of 579.86 records per second among all
my producers. How did that happen? Why this number is not 1 then?
Thanks.
On Tue, Aug 18, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
and your setup.
-Tao
On Tue, Aug 18, 2015 at 11:34 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Also, When I set the target throughput to be 1 records/s, The actual
test results show I got an average of 579.86 records per second among all
my producers. How did that happen? Why this number
latency will become meaningless for a
latency-purpose test.
On Tue, Aug 18, 2015 at 11:48 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
I see. Thank you Tao. But now I don't get it what Jay said that my
latency
test only makes sense if I set a fixed throughput. Why do I need to set a
fixed
records/sec).
-Jay
On Thu, Aug 13, 2015 at 12:18 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Thank you Alvaro,
How to use sync producers? I am running the standard ProducerPerformance
test from kafka to measure the latency of each message to send from
producer to broker only
unnecessarily) . Also may be you
want to increase the batch.size further more, you will get even better
throughput with more or less same latency (as there is no shortage of
events in the test program).
On Thu, Aug 13, 2015 at 1:13 PM Yuheng Du yuheng.du.h...@gmail.com
wrote:
Yes there is. But if we
on log.flush.interval.messages
and
log.flush.interval.ms, if the segment file is in the pagecache, the
consumers will still benefit from that pagecache and OS wouldn't read
it
again from disk.
On Thu, Aug 13, 2015 at 2:54 PM Yuheng Du yuheng.du.h...@gmail.com
wrote:
Hi
I am running an experiment where 92 producers is publishing data into 6
brokers and 10 consumer are reading online data simultaneously.
How should I do to reduce the latency? Currently when I run the producer
performance test the average latency is around 10s.
Should I disable log.flush? How to
Also, the latency results show no major difference when using ack=0 or
ack=1. Why is that?
On Thu, Aug 13, 2015 at 11:51 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
I am running an experiment where 92 producers is publishing data into 6
brokers and 10 consumer are reading online data
Hi,
As I understand it, kafka brokers will store the incoming messages into
pagecache as much as possible and then flush them into disk, right?
But in my experiment where 90 producers is publishing data into 6 brokers,
I see that the log directory on disk where broker stores the data is
Hi,
I am running a test which 92 producers each publish 53000 records of size
254 bytes to 2 brokers.
The average latency shown in each producer has high variations. For some
producer, the average latency is as low as 38ms to send the 53000 records;
but for some producer, the average latency is
Hi guys,
I was reading a paper today in which the latency of kafka and rabbitmq is
compared:
http://downloads.hindawi.com/journals/js/2015/468047.pdf
To my surprise, kafka has shown some large variations of latency as the
number of records per second increases.
So I am curious about why is
Hi,
I am running 40 producers on 40 nodes cluster. The messages are sent to 6
brokers in another cluster. The producers are running ProducerPerformance
test.
When 20 nodes are running, the throughput is around 13MB/s and when running
40 nodes, the throughput is around 9MB/s.
I have set
, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Thank you! what performance impacts will it be if I change
log.segment.bytes? Thanks.
On Mon, Jul 27, 2015 at 1:25 PM, Ewen Cheslack-Postava
e...@confluent.io
wrote:
I think log.cleanup.interval.mins was removed in the first 0.8 release
log.segment.bytes/log.roll.{ms,hours} and
log.retention.check.interval.ms.
On Fri, Jul 24, 2015 at 12:49 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Hi,
I am testing the kafka producer performance. So I created a queue and
writes a large amount of data to that queue
, 2015 at 10:03 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
If I want to get higher throughput, should I increase the
log.segment.bytes?
I don't see log.retention.check.interval.ms, but there is
log.cleanup.interval.mins, is that what you mean?
If I set log.roll.ms
prabhbha...@gmail.com
wrote:
Hi,
Have you tried with acks=1 and -1 as well?
Please share the numbers and the message size
Regards,
Prabcs
On Jul 27, 2015 10:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote:
Hi,
I am running 40 producers on 40 nodes cluster. The messages are sent to 6
I deleted the queue and recreated it before I run the test. Things are
working after restart the broker cluster, thanks!
On Fri, Jul 24, 2015 at 12:06 PM, Gwen Shapira gshap...@cloudera.com
wrote:
Does topic speedx1 exist?
On Fri, Jul 24, 2015 at 7:09 AM, Yuheng Du yuheng.du.h...@gmail.com
Hi,
I am trying to run 20 performance test on 10 nodes using pbsdsh.
The messages will send to a 6 brokers cluster. It seems to work for a
while. When I delete the test queue and rerun the test, the broker does not
seem to process incoming messages:
[yuhengd@node1739 kafka_2.10-0.8.2.1]$
Hi,
I am testing the kafka producer performance. So I created a queue and
writes a large amount of data to that queue.
Is there a way to delete the data automatically after some time, say
whenever the data size reaches 50GB or the retention time exceeds 10
seconds, it will be deleted so my disk
PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Just wanna make sure, in server.properties, the configuration
log.dirs=/tmp/kafka-logs
specifies the directory of where the log (data) stores, right?
If I want the data to be saved elsewhere, this is the configuration I
need
to change
Just wanna make sure, in server.properties, the configuration
log.dirs=/tmp/kafka-logs
specifies the directory of where the log (data) stores, right?
If I want the data to be saved elsewhere, this is the configuration I need
to change, right?
Thanks for answering.
best,
*every* record waits that long.
Of course, these numbers are estimates, depend on my having used 1ms, but
hopefully should make it clear why you can see relatively large latencies.
-Ewen
On Wed, Jul 15, 2015 at 1:38 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Hi,
I have run the end
of insight into
the
issue. Though it is understandable that your specific results need to be
verified, it seems that the KIP-25 patch is functional and I can use it
for
my own benchmarking purposes? Is that correct? Thanks again!
On Tue, Jul 14, 2015 at 8:22 AM, Yuheng Du yuheng.du.h
(
http://kafka.apache.org/documentation.html#consumerconfigs). The default
value listed at document is 100(ms).
To add java heap space to jvm, put -Xmx$Size(max heap size) for your jvm
option.
On Wed, Jul 15, 2015 at 12:29 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Tao,
Thanks
at kafka.tools.TestEndToEndLatency$.main(TestEndToEndLatency.scala:69)
at kafka.tools.TestEndToEndLatency.main(TestEndToEndLatency.scala)
What command should I do to add java heap space to jvm? Thanks!
Yuheng
On Wed, Jul 15, 2015 at 3:29 AM, Yuheng Du yuheng.du.h...@gmail.com wrote:
Tao
Hi,
I have run the end to end latency test and the producerPerformance test on
my kafka cluster according to
https://gist.github.com/jkreps/c7ddb4041ef62a900e6c
In end to end latency test, the latency was around 2ms. In
producerperformance test, if use batch size 8196 to send 50,000,000 records:
be put in consumer_fetch_max_wait? Thanks.
On Tue, Jul 14, 2015 at 5:21 PM, Tao Feng fengta...@gmail.com wrote:
I think ProducerPerformance microbenchmark only measure between client to
brokers(producer to brokers) and provide latency information.
On Tue, Jul 14, 2015 at 11:05 AM, Yuheng Du
delay, and what
other components?
Thanks.
best,
Yuheng
On Wed, Jul 15, 2015 at 3:51 AM, Yuheng Du yuheng.du.h...@gmail.com wrote:
Tao,
If I am running on the command line the following command
bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency 192.168.1.3:9092
192.168.1.1:2181
In kafka performance tests https://gist.github.com/jkreps
/c7ddb4041ef62a900e6c
The TestEndtoEndLatency results are typically around 2ms, while the
ProducerPerformance normally has average latencyaround several hundres ms
when using batch size 8196.
Are both results talking about end to end
kafkatest/tests/benchmark_test.py
Definitely keep us posted about which parts are difficult, annoying, or
confusing about this process and we'll do our best to help.
Thanks,
Geoff
On Wed, Jul 15, 2015 at 12:49 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Jiefu,
Have you tried to run
.
Guozhang
On Wed, Jul 15, 2015 at 11:36 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
In kafka performance tests https://gist.github.com/jkreps
/c7ddb4041ef62a900e6c
The TestEndtoEndLatency results are typically around 2ms, while the
ProducerPerformance normally has average
from producer
to broker, then to consumer.
I cannot remember the details not but I think the EndtoEndLatency test
record the latency as average, hence it is small.
Guozhang
On Wed, Jul 15, 2015 at 12:28 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Guozhang,
Thank you for explaining
/tests).
The tool we're using to bring up the slave virtual machines is called
vagrant, so the vagrant steps in the quickstart are really telling you
how to install the virtual machines.
Hope that helps!
Cheers,
Geoff
On Wed, Jul 15, 2015 at 12:13 PM, Yuheng Du yuheng.du.h...@gmail.com
/trunk/bin/kafka-run-class.sh
KAFKA_JVM_PERFORMANCE_OPTS.
On Wed, Jul 15, 2015 at 12:51 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Tao,
If I am running on the command line the following command
bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency
192.168.1.3:9092
to the Kafka cluster
https://kafka.apache.org/documentation.html#newproducerconfigs
On Tue, Jul 14, 2015 at 7:29 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Does anyone know what is bootstrap.servers=
esv4-hcl198.grid.linkedin.com:9092 means in the following test command:
bin/kafka
Also, I guess setting the target throughput to -1 means let it be as high
as possible?
On Tue, Jul 14, 2015 at 10:36 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Thanks. If I set the acks=1 in the producer config options in
bin/kafka-run-class.sh
Does anyone know what is bootstrap.servers=
esv4-hcl198.grid.linkedin.com:9092 means in the following test command:
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test7 5000 100 -1 acks=1 bootstrap.servers=
esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864
pointed out. Do any of
your brokers fall out of the ISR when sending messages? It seems like your
setup should be fine, so I'm not entirely sure.
On Tue, Jul 14, 2015 at 1:31 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Jiefu,
I am performing these tests on a 6 nodes cluster in cloudlab
)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffe
(END)
On Tue, Jul 14, 2015 at 5:24 PM, Yuheng Du yuheng.du.h...@gmail.com wrote:
Hi Jiefu, Gwen,
I am running the Throughput versus stored data test:
bin/kafka-run-class.sh
:12 PM, Yuheng Du yuheng.du.h...@gmail.com wrote:
I checked the logs on the brokers, it seems that the zookeeper or the
kafka server process is not running on this broker...Thank you guys. I will
see if it happens again.
On Tue, Jul 14, 2015 at 4:53 PM, JIEFU GONG jg...@berkeley.edu wrote:
Hmm
But is there a way to let kafka override the old data if the disk is
filled? Or is it not necessary to use this figure? Thanks.
On Tue, Jul 14, 2015 at 10:14 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Jiefu,
I agree with you. I checked the hardware specs of my machines, each one of
them
to write data?
On Tue, Jul 14, 2015 at 2:27 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Also, the log in another broker (not the bootstrap) says:
[2015-07-14 15:18:41,220] FATAL [Replica Manager on Broker 1]: Error
writing to highwatermark file: (kafka.server.ReplicaManager)
[2015-07
:48,737] INFO [Kafka Server 1], shutting down
(kafka.server.KafkaServer)
I have checked that the zookeeper is running fine. Can anyone help why I
got the error? Thanks.
On Tue, Jul 14, 2015 at 10:24 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
But is there a way to let kafka override the old data
Hi,
I am running the performance test for kafka. https://gist.github.com/jkreps
/c7ddb4041ef62a900e6c
For the Three Producers, 3x async replication scenario, the command is
the same as one producer:
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test 5000 100 -1
:
Yuheng,
Yes, if you read the blog post it specifies that he's using three separate
machines. There's no reason the producers cannot be started at the same
time, I believe.
On Tue, Jul 14, 2015 at 11:42 AM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Hi,
I am running the performance test
Currently, the latency test from kafka test the end to end latency between
producers and consumers.
Is there a way to test the producer to broker and broker to consumer
delay seperately?
Thanks.
org.apache.kafka.clients.tools.ProducerPerformance topic_name
num_records record_size target_records_sec [prop_name=prop_value]*
On Tue, 14 Jul 2015 at 05:08 Yuheng Du yuheng.du.h...@gmail.com wrote:
I am using the binaries of kafka_2.10-0.8.2.1. Could that be the problem?
Should I use the source of kafka
Hi guys,
I am trying to replicate the test of benchmarking kafka at
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
.
When I run
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance
test7 5000 100 -1 acks=1
directory is the ProducerPerformance class resides?
Thanks.
On Mon, Jul 13, 2015 at 4:37 PM, JIEFU GONG jg...@berkeley.edu wrote:
You may need to open up your run-class.sh in a text editor and modify the
classpath -- I believe I had a similar error before.
On Mon, Jul 13, 2015 at 1:16 PM, Yuheng Du
-class.sh in a text editor and modify the
classpath -- I believe I had a similar error before.
On Mon, Jul 13, 2015 at 1:16 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Hi guys,
I am trying to replicate the test of benchmarking kafka at
http://engineering.linkedin.com/kafka/benchmarking
Hi Wan,
I tried to install this DCMonitor, but when I try to clone the project, but
it gives me Permission denied, the remote end hung up unexpectedly. Can
you provide any suggestions to this issue?
Thanks.
best,
Yuheng
On Mon, Mar 23, 2015 at 8:54 AM, Wan Wei flowbeha...@gmail.com wrote:
We
I am wondering where does kafka cluster keep the topic metadata (name,
partition, replication, etc)? How does a server recover the topic's
metadata and messages after restart and what data will be lost?
Thanks for anyone to answer my questions.
best,
Yuheng
with topic
metadata as well. You can use zookeeper-shell.sh or zkCli.sh to check zk
nodes, /brokers/topics will give you the list of topics .
--
Harsha
On March 9, 2015 at 8:20:59 AM, Yuheng Du (yuheng.du.h...@gmail.com)
wrote:
I am wondering where does kafka cluster keep the topic metadata
://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html
--
Harsha
On March 9, 2015 at 8:39:00 AM, Yuheng Du (yuheng.du.h...@gmail.com)
wrote:
Harsha,
Thanks for reply. So what if the zookeeper cluster fails? Will the topics
information be lost? What fault-tolerant mechanism does zookeeper offer?
best
cluster.
Good luck!
On Thu, Mar 5, 2015 at 12:30 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Thank you Gwen,
I also need the kafka cluster continue to provide message brokering
service
to a Storm cluster after the benchmarking. I am fairly new to cluster
setups. So
with the results:
https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Gwen
On Thu, Mar 5, 2015 at 12:16 PM, Yuheng Du yuheng.du.h...@gmail.com
wrote:
Hi everyone,
I am trying to set up a kafka cluster consisting of three machines. I
wanna
Hi everyone,
I am trying to set up a kafka cluster consisting of three machines. I wanna
run a benchmarking program in them. Can anyone recommend a step by step
tutorial/instruction of how I can do it?
Thanks.
best,
Yuheng
76 matches
Mail list logo