Re: Connectors Information from Microsoft SQL Server to Kafka

2018-01-15 Thread Konstantine Karantasis
You might find this connector useful for your use case:

https://github.com/jcustenborder/kafka-connect-cdc-mssql

Konstantine

On Tue, Dec 12, 2017 at 9:29 PM, harish reddy m 
wrote:

> Hi Team,
>
> We have a requirement of replicating the data from MSSQL Source database to
> MSSQL target database.
>
> We are using SQL Server 2014 Web edition as the source database and target
> database as well. We want to replicate the data from Source database to
> target database realtime.
>
> Enabled the change tracking on the SQL Server Database side as change data
> capture is not supported in web edition of SQL Server 2014.
>
> So we are using Kafka Connect to read the changes (updates/inserts) from
> the source database and then push them to target database in realtime.
>
> Could you help us in getting if any Kafka connectors for MSSQL to Kafka and
> Kafka to MSSQL  are available for the above scenario.
>
> We are open for any suggestions as well for the above requirement.
>
> Thanks
> Harish
>


Re: Kafka Producer HA - using Kafka Connect

2018-01-15 Thread Konstantine Karantasis
If I understand correctly, and your question refers to general fault
tolerance, the answer is yes, Kafka Connect offers fault tolerance in
distributed mode.

You may start several Connect workers and if a worker running one task with
your single producer fails unexpectedly, then this task will be restarted
in another worker and continue from where it was left off.

-Konstantine

On Thu, Nov 30, 2017 at 4:08 PM, sham singh 
wrote:

> We are looking at implementing Kafka Producer HA ..
>
> i.e there are 2 producers which can produce the same data ..
> The objective is to have High Availability implemented for the Kafka
> Producer ..
>
> i.e. if Producer1 goes down, the Producer2 kick starts and produces data
> starting from the offset committed by the Producer1
>
> Would using Kafka Connect help in this scenario ?
> or a custom solution would have to be built ?
>
> Appreciate your response on this.
>


Re: 1 to N transformers in Kafka Connect

2018-01-15 Thread Konstantine Karantasis
Indeed, there is no flattening operator in Kafka Connect's SMTs at the
moment. The 'apply' method in the Transformation interface accepts a single
record and returns another - transformed - record or null.

Konstantine.

On Wed, Dec 27, 2017 at 8:25 PM, Ziliang Chen  wrote:

> Hi,
>
> May i ask if it is possible to do 1 kafka record to many Kafka Connect
> records transformation ?
> I know we have 1:1 transformation supported in Kafka Connect, but it
> appears to me there are quite some user cases which requires 1:N
> transformation
>
> Thank you very much !
>
> --
> Regards, Zi-Liang
>
> Mail:zlchen@gmail.com
>


what are common ways to convert info on a web site into a log entry?

2018-01-15 Thread James Smyth
Hi Kafka people,

I am very new to kafka, so perhaps my question is naive. I spent some time 
searching around at resources of kafka but only became more confused.

What are common ways to pull info from a web site and send to kafka to become a 
log entry?
There is a web site that I want to pull a piece of data from once/month and 
have that data written to Kafka log. Consumers will be listening for that 
message to do processing on it.
I am not sure about common ways to do this.

I am thinking I could have some scheduler (e.g. cron) wake up once per month 
and trigger the pull of the data from the web site and then send it to a kafka 
stream.
Does kafka have ability to trigger events once/month or is using cron a better 
idea?
What is the scheduler triggering a stand-alone batch job or the running of a 
some service like a kafka producer? Should I worry about a service running all 
the time when it is likely to only do a few seconds of work each month?

Many thanks,

James.

one machine that have four network.....

2018-01-15 Thread ??????
hi guys,
 I have a linux(Centos7) that have four network interface,  and  i'm 
tryying  to build a pseudo-cluster in this machine.
Four cards correspond to four ip??101, 104,105,106), 
and three brokers config :  
listeners=xxx.xxx.xxx.104:9090.
listeners=xxx.xxx.xxx.105:9091.
listeners=xxx.xxx.xxx.106:9092.
three zookeepers: zk1---xxx.104:2181, zk2---xxx:105:2182, zk3---xxx.106:2183.


run zks first, then run in right.
run kafka broker,  then  run in  right.


produce data to this to this pseudo-cluster...Trouble is coming:


sar -n DEV 1:
network-101-- IO(0-1000Mbps)
network-104-- IO(0-10Kbps)
network-105-- IO(0-10kbps)
network-106-- IO(0-10Kbps)
lo---IO(0-1000Mbps)


When production data throughput reaches 1000 Mbps, reproduction fails??Then 
unplug the network cable of 101. continue:


sar -n DEV 1:
network-101-- IO(0bps)
network-104-- IO(0-10Kbps)
network-105-- IO(0-10kbps)
network-106-- IO(0-1000Mbps)
lo---IO(0-1000Mbps)



what happens, and why??
Would you like to give me some advice, guys? 


Urgent, online and so on

Re: what are common ways to convert info on a web site into a log entry?

2018-01-15 Thread James Smyth
thanks!  I will write my reply into the mail group once I have fiddled with 
your suggestions.

James.

> On Jan 15, 2018, at 2:20 PM, Jacob Sheck  wrote:
> 
> You will need to find or create something to accomplish this.  Topics in
> Kafka primarily act as queues.  If you search the web for more information
> about "kafka connector http" you will find a few projects that do this.
> You could also take a look at the Confluent S3 Connector as an example of
> how to accomplish what you want to do.  If you are interested in a quick
> and dirty solution you could put a script using curl in cron to pull the
> web content and write it to a topic using the kafka-rest-proxy.
> 
> On Mon, Jan 15, 2018 at 1:52 PM James Smyth 
> wrote:
> 
>> Hi Kafka people,
>> 
>> I am very new to kafka, so perhaps my question is naive. I spent some time
>> searching around at resources of kafka but only became more confused.
>> 
>> What are common ways to pull info from a web site and send to kafka to
>> become a log entry?
>> There is a web site that I want to pull a piece of data from once/month
>> and have that data written to Kafka log. Consumers will be listening for
>> that message to do processing on it.
>> I am not sure about common ways to do this.
>> 
>> I am thinking I could have some scheduler (e.g. cron) wake up once per
>> month and trigger the pull of the data from the web site and then send it
>> to a kafka stream.
>> Does kafka have ability to trigger events once/month or is using cron a
>> better idea?
>> What is the scheduler triggering a stand-alone batch job or the running of
>> a some service like a kafka producer? Should I worry about a service
>> running all the time when it is likely to only do a few seconds of work
>> each month?
>> 
>> Many thanks,
>> 
>> James.
>> 
>> 



Re: what are common ways to convert info on a web site into a log entry?

2018-01-15 Thread Jacob Sheck
You will need to find or create something to accomplish this.  Topics in
Kafka primarily act as queues.  If you search the web for more information
about "kafka connector http" you will find a few projects that do this.
You could also take a look at the Confluent S3 Connector as an example of
how to accomplish what you want to do.  If you are interested in a quick
and dirty solution you could put a script using curl in cron to pull the
web content and write it to a topic using the kafka-rest-proxy.

On Mon, Jan 15, 2018 at 1:52 PM James Smyth 
wrote:

> Hi Kafka people,
>
> I am very new to kafka, so perhaps my question is naive. I spent some time
> searching around at resources of kafka but only became more confused.
>
> What are common ways to pull info from a web site and send to kafka to
> become a log entry?
> There is a web site that I want to pull a piece of data from once/month
> and have that data written to Kafka log. Consumers will be listening for
> that message to do processing on it.
> I am not sure about common ways to do this.
>
> I am thinking I could have some scheduler (e.g. cron) wake up once per
> month and trigger the pull of the data from the web site and then send it
> to a kafka stream.
> Does kafka have ability to trigger events once/month or is using cron a
> better idea?
> What is the scheduler triggering a stand-alone batch job or the running of
> a some service like a kafka producer? Should I worry about a service
> running all the time when it is likely to only do a few seconds of work
> each month?
>
> Many thanks,
>
> James.
>
>


what are common ways to convert info on a web site into a log entry?

2018-01-15 Thread James Smyth
Hi Kafka people,

I am very new to kafka, so perhaps my question is naive. I spent some time 
searching around at resources of kafka but only became more confused.

What are common ways to pull info from a web site and send to kafka to become a 
log entry?
There is a web site that I want to pull a piece of data from once/month and 
have that data written to Kafka log. Consumers will be listening for that 
message to do processing on it.
I am not sure about common ways to do this.

I am thinking I could have some scheduler (e.g. cron) wake up once per month 
and trigger the pull of the data from the web site and then send it to a kafka 
stream.
Does kafka have ability to trigger events once/month or is using cron a better 
idea?
What is the scheduler triggering a stand-alone batch job or the running of a 
some service like a kafka producer? Should I worry about a service running all 
the time when it is likely to only do a few seconds of work each month?

Many thanks,

James.



Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-15 Thread Vincent Rischmann
Hello again,

I had the same problem again today. While stopping the broker it crashed, then 
after upgrading to 0.11.0.2 and restarting the broker it's again taking a 
really long time to recover. It's been almost 3 hours now and it's not done.

I restarted the previous broker which crashed, but since it's a clean shutdown 
it didn't recompute the indexes so it's not really the same. However, I think I 
can rule out a hardware problem since another broker now has the same problem.

After I've upgraded everything I might trigger a crash voluntarily to see if it 
reproduces the problem.

On Mon, Jan 15, 2018, at 10:04 AM, Ismael Juma wrote:
> Hi James,
> 
> There was a bug in 0.11.0.0 that could cause all segments to be scanned
> during a restart. I believe that was fixed in subsequent 0.11.0.x releases.
> 
> Ismael
> 
> On Fri, Jan 12, 2018 at 6:49 AM, James Cheng  wrote:
> 
> > We saw this as well, when updating from 0.10.1.1 to 0.11.0.1.
> >
> > Have you restarted your brokers since then? Did it take 8h to start up
> > again, or did it take its normal 45 minutes?
> >
> > I don't think it's related to the crash/recovery. Rather, I think it's due
> > to the upgrade from 0.10.1.1 to 0.11.0.1
> >
> > I think 0.11.0.0 introduced some new files in the log directories (maybe
> > the .snapshot files?). The first time 0.11.0.0 (or newer) started up, it
> > scanned the entire .log files to create... something. It scanned *all* the
> > segments, not just the most recent ones. I think that's why it took so
> > long. I think normally log recovery only looks at the most recent segments.
> >
> > We noticed this only on the FIRST boot when running 0.11+. From then on,
> > startups were our normal length of time.
> >
> > In your https://pastebin.com/tZqze4Ya, I see a line like:
> > [2018-01-05 13:43:34,776] INFO Completed load of log webapi-event-1 with 2
> > log segments, log start offset 1068104 and log end offset 1236587 in 9547
> > ms (kafka.log.Log)
> >
> > That line says that that partition took 9547ms (9.5 seconds) to
> > load/recover. We had large partitions that took 30 minutes to recover, on
> > that first boot. When I used strace to see what I/O the broker was doing,
> > it was reading ALL the segments for the partitions.
> >
> > -James
> >
> >
> >
> > > On Jan 11, 2018, at 10:56 AM, Vincent Rischmann 
> > wrote:
> > >
> > > If anyone else has any idea, I'd love to hear it.
> > >
> > > Meanwhile, I'll resume upgrading my brokers and hope it doesn't crash
> > and/or take so much time for recovery.
> > >
> > > On Sat, Jan 6, 2018, at 7:25 PM, Vincent Rischmann wrote:
> > >> Hi,
> > >>
> > >> just to clarify: this is the cause of the crash
> > >> https://pastebin.com/GuF60kvF in the broker logs, which is why I
> > >> referenced https://issues.apache.org/jira/browse/KAFKA-4523
> > >>
> > >> I had this crash some time ago and yesterday was in the process of
> > >> upgrading my brokers to 0.11.0.2 in part to address this bug,
> > >> unfortunately while stopping this particular broker it crashed.
> > >>
> > >> What I don't understand is why the recovery time after upgrading was so
> > >> high. A couple of month ago when a broker crashed due to this bug and
> > >> recovered it didn't take nearly as long. In fact, I never had a recovery
> > >> that long on any broker, even when they suffered a kernel panic or power
> > >> failure.
> > >>
> > >> We have quite a bit of data, however as I said with 0.10.1.1 it "only"
> > >> took around 45 minutes. The broker did do a lot of I/O while recovering
> > >> (to the point where even just running ls was painfully slow) but afair
> > >> it was the same every time a broker recovered. Volume of data hasn't
> > >> changed much since the last crash with 0.10.1.1, in fact I removed a lot
> > >> of data recently.
> > >>
> > >> Our brokers are running with 3 SATA disks in raid 0 (using mdadm), while
> > >> recovering yesterday atop reported around 200MB/s of read throughput.
> > >>
> > >> Here are some graphs from our monitoring:
> > >>
> > >> - CPU usage https://vrischmann.me/files/fr3/cpu.png
> > >> - Disk IO https://vrischmann.me/files/fr3/disk_io.png and
> > >> https://vrischmann.me/files/fr3/disk_usage.png
> > >>
> > >> On Sat, Jan 6, 2018, at 4:53 PM, Ted Yu wrote:
> > >>> Ismael:
> > >>> We're on the same page.
> > >>>
> > >>> 0.11.0.2 was released on 17 Nov 2017.
> > >>>
> > >>> By 'recently' in my previous email I meant the change was newer.
> > >>>
> > >>> Vincent:
> > >>> Did the machine your broker ran on experience power issue ?
> > >>>
> > >>> Cheers
> > >>>
> > >>> On Sat, Jan 6, 2018 at 7:36 AM, Ismael Juma  wrote:
> > >>>
> >  Hi Ted,
> > 
> >  The change you mention is not part of 0.11.0.2.
> > 
> >  Ismael
> > 
> >  On Sat, Jan 6, 2018 at 3:31 PM, Ted Yu  wrote:
> > 
> > > bq. WARN Found a corrupted index file due to requirement failed:
> > Corrupt
> 

Re: [DISCUSS] KIP-247: Add public test utils for Kafka Streams

2018-01-15 Thread Saïd Bouras
Hi Matthias,

I read the KIP and it will be very helpful thanks to the changes, I don't
see though a part that handle topologies that use avro schemas, is it in
the scope of the KIP ?

I open an issue two month ago in the schema-registry repo :
https://github.com/confluentinc/schema-registry/issues/651 that explain
that when testing topologies using schema registry, the schema registry
client mock is not thread safe and thus in the different processors nodes
when deserializing it will not work...

In my unit tests I wrapped the mock schema registry client into a singleton
but this solution is not enough satisfying.

Thanks in advance, regards :-)


On Fri, Jan 12, 2018 at 3:06 AM Matthias J. Sax 
wrote:

> Dear Kafka community,
>
> I want to propose KIP-247 to add public test utils to the Streams API.
> The goal is to simplify testing of Kafka Streams applications.
>
> Please find details in the wiki:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-247%3A+Add+public+test+utils+for+Kafka+Streams
>
> This is an initial KIP, and we hope to add more utility functions later.
> Thus, this KIP is not comprehensive but a first step. Of course, we can
> enrich this initial KIP if we think it falls too short. But we should
> not aim to be comprehensive to keep the scope manageable.
>
> In fact, I think we should add some more helpers to simplify result
> verification. I will update the KIP with this asap. Just wanted to start
> the discussion early on.
>
> An initial WIP PR can be found here:
> https://github.com/apache/kafka/pull/4402
>
> I also included the user-list (please hit "reply-all" to include both
> lists in this KIP discussion).
>
> Thanks a lot.
>
>
> -Matthias
>
>
>

-- 

Saïd BOURAS

Consultant Big Data
Mobile: 0662988731
Zenika Paris
10 rue de Milan 75009 Paris
Standard : +33(0)1 45 26 19 15 <+33(0)145261915> - Fax : +33(0)1 72 70 45 10
<+33(0)172704510>


Re: Insanely long recovery time with Kafka 0.11.0.2

2018-01-15 Thread Ismael Juma
Hi James,

There was a bug in 0.11.0.0 that could cause all segments to be scanned
during a restart. I believe that was fixed in subsequent 0.11.0.x releases.

Ismael

On Fri, Jan 12, 2018 at 6:49 AM, James Cheng  wrote:

> We saw this as well, when updating from 0.10.1.1 to 0.11.0.1.
>
> Have you restarted your brokers since then? Did it take 8h to start up
> again, or did it take its normal 45 minutes?
>
> I don't think it's related to the crash/recovery. Rather, I think it's due
> to the upgrade from 0.10.1.1 to 0.11.0.1
>
> I think 0.11.0.0 introduced some new files in the log directories (maybe
> the .snapshot files?). The first time 0.11.0.0 (or newer) started up, it
> scanned the entire .log files to create... something. It scanned *all* the
> segments, not just the most recent ones. I think that's why it took so
> long. I think normally log recovery only looks at the most recent segments.
>
> We noticed this only on the FIRST boot when running 0.11+. From then on,
> startups were our normal length of time.
>
> In your https://pastebin.com/tZqze4Ya, I see a line like:
> [2018-01-05 13:43:34,776] INFO Completed load of log webapi-event-1 with 2
> log segments, log start offset 1068104 and log end offset 1236587 in 9547
> ms (kafka.log.Log)
>
> That line says that that partition took 9547ms (9.5 seconds) to
> load/recover. We had large partitions that took 30 minutes to recover, on
> that first boot. When I used strace to see what I/O the broker was doing,
> it was reading ALL the segments for the partitions.
>
> -James
>
>
>
> > On Jan 11, 2018, at 10:56 AM, Vincent Rischmann 
> wrote:
> >
> > If anyone else has any idea, I'd love to hear it.
> >
> > Meanwhile, I'll resume upgrading my brokers and hope it doesn't crash
> and/or take so much time for recovery.
> >
> > On Sat, Jan 6, 2018, at 7:25 PM, Vincent Rischmann wrote:
> >> Hi,
> >>
> >> just to clarify: this is the cause of the crash
> >> https://pastebin.com/GuF60kvF in the broker logs, which is why I
> >> referenced https://issues.apache.org/jira/browse/KAFKA-4523
> >>
> >> I had this crash some time ago and yesterday was in the process of
> >> upgrading my brokers to 0.11.0.2 in part to address this bug,
> >> unfortunately while stopping this particular broker it crashed.
> >>
> >> What I don't understand is why the recovery time after upgrading was so
> >> high. A couple of month ago when a broker crashed due to this bug and
> >> recovered it didn't take nearly as long. In fact, I never had a recovery
> >> that long on any broker, even when they suffered a kernel panic or power
> >> failure.
> >>
> >> We have quite a bit of data, however as I said with 0.10.1.1 it "only"
> >> took around 45 minutes. The broker did do a lot of I/O while recovering
> >> (to the point where even just running ls was painfully slow) but afair
> >> it was the same every time a broker recovered. Volume of data hasn't
> >> changed much since the last crash with 0.10.1.1, in fact I removed a lot
> >> of data recently.
> >>
> >> Our brokers are running with 3 SATA disks in raid 0 (using mdadm), while
> >> recovering yesterday atop reported around 200MB/s of read throughput.
> >>
> >> Here are some graphs from our monitoring:
> >>
> >> - CPU usage https://vrischmann.me/files/fr3/cpu.png
> >> - Disk IO https://vrischmann.me/files/fr3/disk_io.png and
> >> https://vrischmann.me/files/fr3/disk_usage.png
> >>
> >> On Sat, Jan 6, 2018, at 4:53 PM, Ted Yu wrote:
> >>> Ismael:
> >>> We're on the same page.
> >>>
> >>> 0.11.0.2 was released on 17 Nov 2017.
> >>>
> >>> By 'recently' in my previous email I meant the change was newer.
> >>>
> >>> Vincent:
> >>> Did the machine your broker ran on experience power issue ?
> >>>
> >>> Cheers
> >>>
> >>> On Sat, Jan 6, 2018 at 7:36 AM, Ismael Juma  wrote:
> >>>
>  Hi Ted,
> 
>  The change you mention is not part of 0.11.0.2.
> 
>  Ismael
> 
>  On Sat, Jan 6, 2018 at 3:31 PM, Ted Yu  wrote:
> 
> > bq. WARN Found a corrupted index file due to requirement failed:
> Corrupt
> > index found, index file
> > (/data/kafka/data-processed-15/54942918.index)
> >
> > Can you search backward for 54942918.index in the log to
> see
>  if
> > we can find the cause for corruption ?
> >
> > This part of code was recently changed by :
> >
> > KAFKA-6324; Change LogSegment.delete to deleteIfExists and harden log
> > recovery
> >
> > Cheers
> >
> > On Sat, Jan 6, 2018 at 7:18 AM, Vincent Rischmann <
> vinc...@rischmann.fr>
> > wrote:
> >
> >> Here's an excerpt just after the broker started:
> >> https://pastebin.com/tZqze4Ya
> >>
> >> After more than 8 hours of recovery the broker finally started. I
>  haven't
> >> read through all 8 hours of log but the parts I looked at are like
> the
> >> pastebin.
> >>
>