Streams meetup@LinkedIn (Mar 21)

2018-03-10 Thread Becket Qin
Hi Kafka users and developers,

We are going to host our quarterly Stream Processing Meetup@LinkedIn on Mar
21. There will be three talks about Apache Pulsar, Apache Calcite and
Apache Samza, as well as LinkedIn's latest K-V store Venice built on top of
Apache Kafka and Apache Samza.

Please check the details below if you are interested.

https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/248309045/


Thanks,

Jiangjie (Becket) Qin


RE: Suggestion over architecture

2018-03-10 Thread adrien ruffie
You right, but I'm afraid about the unauthorized queries' rate that the 
endpoint can receive... which can become a bottleneck ...

But for now I think how I can estimate the partition's number. Due to the 
formula max(t/p, t/c) I don't really know at the moment the real number of "t" 
but I think the producers at the endpoint will be 100MB/S at the highest and 
consumers between 25~30MB/S ...


De : Svante Karlsson 
Envoyé : samedi 10 mars 2018 20:16:10
À : users@kafka.apache.org
Objet : Re: Suggestion over architecture

Yes, but I misread his reply and thought that he meant the "kafka rest
proxy". But now I see that we say the same thing - sorry for the confusion.

The normal way to do the authentication and authorization would  be in the
rest/grpc endpoint before sending it to kafka.

2018-03-10 19:39 GMT+01:00 adrien ruffie :

> Thank Nick, thank Svante,
>
>
> Svante, you say like Nick right ? Send a client message type which
> encapsulates the emailing to a REST endpoint in our infrastructure and the
> endpoint
>
> push into a kafka's topic ?
>
> And if we need to ensure that client which send any emailing is allowed,
> where you potentially check is authorization ? After message reception on
> the REST endpoint ? Directly by the sender in on premise webapp ? Ok before
> push the topic ? I think it's really better to check that before sending
> message to our infrastructure side, but the webapp is unaware if it allowed
> or not ...
>
>
>
> thank for your reply 😊
>
> Adrien
>
> 
> De : Svante Karlsson 
> Envoyé : samedi 10 mars 2018 19:13:04
> À : users@kafka.apache.org
> Objet : Re: Suggestion over architecture
>
> You do not want to expose the kafka instance to your different clients. put
> some api endpoint between. rest/grpc or whatever.
>
> 2018-03-10 19:01 GMT+01:00 Nick Vasilyev :
>
> > Hard to say without more info, but why not just deploy something like a
> > REST api and expose it to your clients, they will send the data to the
> api
> > and it will in turn feed the Kafka  topic.
> >
> > You will minimize coupling and be able to scale / upgrade easier.
> >
> > On Mar 10, 2018 2:47 AM, "adrien ruffie" 
> > wrote:
> >
> > > Hello all,
> > >
> > >
> > > in my company we plan to set up the following architecture for our
> > client:
> > >
> > >
> > > An internal kafka cluster in our company, and deploy a webapp (our
> > > software solution) on premise for our clients.
> > >
> > >
> > > We think to create one producer by "webapp" client in order to push in
> a
> > > global topic (in our kafka) an message which represent an email.
> > >
> > >
> > > The idea behind this, is to unload the client webapp to process several
> > > mass mailing operation groups, and treat them ourselves with
> > >
> > > dedicateds servers into our infrastructure. And each dedicated server
> > will
> > > be a topic's consumer where the message(email) will be streamed.
> > >
> > >
> > > My main question is, do you think, that each client can be a producer ?
> > > (if we have for example 200/300 clients ?)
> > >
> > > Second question, each client should be a producer ? 😊
> > >
> > > Do you have another idea for this subject ?
> > >
> > >
> > > Thank you & best regards.
> > >
> > >
> > > Adrien
> > >
> >
>


RE: Suggestion over architecture

2018-03-10 Thread adrien ruffie
Don't worry :-) I understood. Thank a lot !


Adrien


De : Svante Karlsson 
Envoyé : samedi 10 mars 2018 20:16:10
À : users@kafka.apache.org
Objet : Re: Suggestion over architecture

Yes, but I misread his reply and thought that he meant the "kafka rest
proxy". But now I see that we say the same thing - sorry for the confusion.

The normal way to do the authentication and authorization would  be in the
rest/grpc endpoint before sending it to kafka.

2018-03-10 19:39 GMT+01:00 adrien ruffie :

> Thank Nick, thank Svante,
>
>
> Svante, you say like Nick right ? Send a client message type which
> encapsulates the emailing to a REST endpoint in our infrastructure and the
> endpoint
>
> push into a kafka's topic ?
>
> And if we need to ensure that client which send any emailing is allowed,
> where you potentially check is authorization ? After message reception on
> the REST endpoint ? Directly by the sender in on premise webapp ? Ok before
> push the topic ? I think it's really better to check that before sending
> message to our infrastructure side, but the webapp is unaware if it allowed
> or not ...
>
>
>
> thank for your reply 😊
>
> Adrien
>
> 
> De : Svante Karlsson 
> Envoyé : samedi 10 mars 2018 19:13:04
> À : users@kafka.apache.org
> Objet : Re: Suggestion over architecture
>
> You do not want to expose the kafka instance to your different clients. put
> some api endpoint between. rest/grpc or whatever.
>
> 2018-03-10 19:01 GMT+01:00 Nick Vasilyev :
>
> > Hard to say without more info, but why not just deploy something like a
> > REST api and expose it to your clients, they will send the data to the
> api
> > and it will in turn feed the Kafka  topic.
> >
> > You will minimize coupling and be able to scale / upgrade easier.
> >
> > On Mar 10, 2018 2:47 AM, "adrien ruffie" 
> > wrote:
> >
> > > Hello all,
> > >
> > >
> > > in my company we plan to set up the following architecture for our
> > client:
> > >
> > >
> > > An internal kafka cluster in our company, and deploy a webapp (our
> > > software solution) on premise for our clients.
> > >
> > >
> > > We think to create one producer by "webapp" client in order to push in
> a
> > > global topic (in our kafka) an message which represent an email.
> > >
> > >
> > > The idea behind this, is to unload the client webapp to process several
> > > mass mailing operation groups, and treat them ourselves with
> > >
> > > dedicateds servers into our infrastructure. And each dedicated server
> > will
> > > be a topic's consumer where the message(email) will be streamed.
> > >
> > >
> > > My main question is, do you think, that each client can be a producer ?
> > > (if we have for example 200/300 clients ?)
> > >
> > > Second question, each client should be a producer ? 😊
> > >
> > > Do you have another idea for this subject ?
> > >
> > >
> > > Thank you & best regards.
> > >
> > >
> > > Adrien
> > >
> >
>


Re: Suggestion over architecture

2018-03-10 Thread Svante Karlsson
Yes, but I misread his reply and thought that he meant the "kafka rest
proxy". But now I see that we say the same thing - sorry for the confusion.

The normal way to do the authentication and authorization would  be in the
rest/grpc endpoint before sending it to kafka.

2018-03-10 19:39 GMT+01:00 adrien ruffie :

> Thank Nick, thank Svante,
>
>
> Svante, you say like Nick right ? Send a client message type which
> encapsulates the emailing to a REST endpoint in our infrastructure and the
> endpoint
>
> push into a kafka's topic ?
>
> And if we need to ensure that client which send any emailing is allowed,
> where you potentially check is authorization ? After message reception on
> the REST endpoint ? Directly by the sender in on premise webapp ? Ok before
> push the topic ? I think it's really better to check that before sending
> message to our infrastructure side, but the webapp is unaware if it allowed
> or not ...
>
>
>
> thank for your reply 😊
>
> Adrien
>
> 
> De : Svante Karlsson 
> Envoyé : samedi 10 mars 2018 19:13:04
> À : users@kafka.apache.org
> Objet : Re: Suggestion over architecture
>
> You do not want to expose the kafka instance to your different clients. put
> some api endpoint between. rest/grpc or whatever.
>
> 2018-03-10 19:01 GMT+01:00 Nick Vasilyev :
>
> > Hard to say without more info, but why not just deploy something like a
> > REST api and expose it to your clients, they will send the data to the
> api
> > and it will in turn feed the Kafka  topic.
> >
> > You will minimize coupling and be able to scale / upgrade easier.
> >
> > On Mar 10, 2018 2:47 AM, "adrien ruffie" 
> > wrote:
> >
> > > Hello all,
> > >
> > >
> > > in my company we plan to set up the following architecture for our
> > client:
> > >
> > >
> > > An internal kafka cluster in our company, and deploy a webapp (our
> > > software solution) on premise for our clients.
> > >
> > >
> > > We think to create one producer by "webapp" client in order to push in
> a
> > > global topic (in our kafka) an message which represent an email.
> > >
> > >
> > > The idea behind this, is to unload the client webapp to process several
> > > mass mailing operation groups, and treat them ourselves with
> > >
> > > dedicateds servers into our infrastructure. And each dedicated server
> > will
> > > be a topic's consumer where the message(email) will be streamed.
> > >
> > >
> > > My main question is, do you think, that each client can be a producer ?
> > > (if we have for example 200/300 clients ?)
> > >
> > > Second question, each client should be a producer ? 😊
> > >
> > > Do you have another idea for this subject ?
> > >
> > >
> > > Thank you & best regards.
> > >
> > >
> > > Adrien
> > >
> >
>


Re: Consultant Help

2018-03-10 Thread Martin Gainty





From: Svante Karlsson 
Sent: Friday, March 2, 2018 3:50 PM
To: users@kafka.apache.org
Subject: Re: Consultant Help

try https://www.confluent.io/ - that's what they do
[https://www.confluent.io/wp-content/uploads/Untitled-design-12.png]

Confluent: Apache Kafka & Streaming Platform for the 
...
www.confluent.io
Confluent, founded by the creators of Apache Kafka, delivers a complete 
execution of Kafka for the Enterprise, to help you run your business in real 
time.




/svante
mg>Svante..you and I know Kafka as well but a relo to SLC is a tall order indeed
mg>Currently looking for onsite resource from LDS resource I used to work for 
in SLC
mg>please contact me offline for appropriate details

2018-03-02 21:21 GMT+01:00 Matt Stone :

> We are looking for a consultant or contractor that can come onsite to our
> Ogden, Utah location in the US, to help with a Kafka set up and maintenance
> project.  What we need is someone with the knowledge and experience to
> build out the Kafka environment from scratch.
>
> We are thinking they would need to be onsite for 6-12 months  to set it
> up, and mentor some of our team so they can get up to speed to do the
> maintenance once the contractor is gone.  If anyone has the experience
> setting up Kafka from scratch in a Linux environment, maintain node
> clusters, and help train others on the team how to do it, and you are
> interested in a long term project working at the client site, I would love
> to start up  a discussion, to see if we could use you for the role.
>
> I would also be interested in hearing about any consulting firms that
> might have resources that could help with this role.
>
> Matt Stone
>
>
> -Original Message-
> From: Matt Daum [mailto:m...@setfive.com]
> Sent: Friday, March 2, 2018 1:11 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka Setup for Daily counts on wide array of keys
>
> Actually it looks like the better way would be to output the counts to a
> new topic then ingest that topic into the DB itself.  Is that the correct
> way?
>
> On Fri, Mar 2, 2018 at 9:24 AM, Matt Daum  wrote:
>
> > I am new to Kafka but I think I have a good use case for it.  I am
> > trying to build daily counts of requests based on a number of
> > different attributes in a high throughput system (~1 million
> > requests/sec. across all  8 servers).  The different attributes are
> > unbounded in terms of values, and some will spread across 100's of
> > millions values.  This is my current through process, let me know
> > where I could be more efficient or if there is a better way to do it.
> >
> > I'll create an AVRO object "Impression" which has all the attributes
> > of the inbound request.  My application servers then will on each
> > request create and send this to a single kafka topic.
> >
> > I'll then have a consumer which creates a stream from the topic.  From
> > there I'll use the windowed timeframes and groupBy to group by the
> > attributes on each given day.  At the end of the day I'd need to read
> > out the data store to an external system for storage.  Since I won't
> > know all the values I'd need something similar to the KVStore.all()
> > but for WindowedKV Stores.  This appears that it'd be possible in 1.1
> > with this
> > commit: https://github.com/apache/kafka/commit/
[https://avatars0.githubusercontent.com/u/13164074?s=200&v=4]

MINOR: Reduce ZK reads and ensure ZK watch is set for listener update… · 
apache/kafka@8df96a4
github.com
… (#4670) Ensures that ZK watch is set for each live broker for listener update 
notifications in the controller. Also avoids reading all brokers from ZooKeeper 
when a broker metadata is modified b...



> > 1d1c8575961bf6bce7decb049be7f10ca76bd0c5 .
> >
> > Is this the best approach to doing this?  Or would I be better using
> > the stream to listen and then an external DB like Aerospike to store
> > the counts and read out of it directly end of day.
> >
> > Thanks for the help!
> > Daum
> >
>


RE: Suggestion over architecture

2018-03-10 Thread adrien ruffie
Thank Nick, thank Svante,


Svante, you say like Nick right ? Send a client message type which encapsulates 
the emailing to a REST endpoint in our infrastructure and the endpoint

push into a kafka's topic ?

And if we need to ensure that client which send any emailing is allowed, where 
you potentially check is authorization ? After message reception on the REST 
endpoint ? Directly by the sender in on premise webapp ? Ok before push the 
topic ? I think it's really better to check that before sending message to our 
infrastructure side, but the webapp is unaware if it allowed or not ...



thank for your reply 😊

Adrien


De : Svante Karlsson 
Envoyé : samedi 10 mars 2018 19:13:04
À : users@kafka.apache.org
Objet : Re: Suggestion over architecture

You do not want to expose the kafka instance to your different clients. put
some api endpoint between. rest/grpc or whatever.

2018-03-10 19:01 GMT+01:00 Nick Vasilyev :

> Hard to say without more info, but why not just deploy something like a
> REST api and expose it to your clients, they will send the data to the api
> and it will in turn feed the Kafka  topic.
>
> You will minimize coupling and be able to scale / upgrade easier.
>
> On Mar 10, 2018 2:47 AM, "adrien ruffie" 
> wrote:
>
> > Hello all,
> >
> >
> > in my company we plan to set up the following architecture for our
> client:
> >
> >
> > An internal kafka cluster in our company, and deploy a webapp (our
> > software solution) on premise for our clients.
> >
> >
> > We think to create one producer by "webapp" client in order to push in a
> > global topic (in our kafka) an message which represent an email.
> >
> >
> > The idea behind this, is to unload the client webapp to process several
> > mass mailing operation groups, and treat them ourselves with
> >
> > dedicateds servers into our infrastructure. And each dedicated server
> will
> > be a topic's consumer where the message(email) will be streamed.
> >
> >
> > My main question is, do you think, that each client can be a producer ?
> > (if we have for example 200/300 clients ?)
> >
> > Second question, each client should be a producer ? 😊
> >
> > Do you have another idea for this subject ?
> >
> >
> > Thank you & best regards.
> >
> >
> > Adrien
> >
>


slow log segment loading during kafka 0.10.2 -> 1.0.1 upgrade

2018-03-10 Thread Yu Yang
Hi,

We are experimenting upgrading our kafka service from 0.10.2 to 1.0.1, and
noticed that it was slow in restarting the broker after updating the kafka
binary.  On a test cluster that have ~120G data on each broker, it took
about 20 minutes to load log segments. I am wondering if there is any
work-around to reduce the long log segment loading time. Can we manually
generate snapshot files to speed up the segment loading?   Based on the
sever logs, it is also not clear to me between `loadSegments()`  and `
loadProducerState(...)`  ,which one took more time during the log segment
loading.

Thanks!

Regards,
-Yu


Re: Suggestion over architecture

2018-03-10 Thread Svante Karlsson
You do not want to expose the kafka instance to your different clients. put
some api endpoint between. rest/grpc or whatever.

2018-03-10 19:01 GMT+01:00 Nick Vasilyev :

> Hard to say without more info, but why not just deploy something like a
> REST api and expose it to your clients, they will send the data to the api
> and it will in turn feed the Kafka  topic.
>
> You will minimize coupling and be able to scale / upgrade easier.
>
> On Mar 10, 2018 2:47 AM, "adrien ruffie" 
> wrote:
>
> > Hello all,
> >
> >
> > in my company we plan to set up the following architecture for our
> client:
> >
> >
> > An internal kafka cluster in our company, and deploy a webapp (our
> > software solution) on premise for our clients.
> >
> >
> > We think to create one producer by "webapp" client in order to push in a
> > global topic (in our kafka) an message which represent an email.
> >
> >
> > The idea behind this, is to unload the client webapp to process several
> > mass mailing operation groups, and treat them ourselves with
> >
> > dedicateds servers into our infrastructure. And each dedicated server
> will
> > be a topic's consumer where the message(email) will be streamed.
> >
> >
> > My main question is, do you think, that each client can be a producer ?
> > (if we have for example 200/300 clients ?)
> >
> > Second question, each client should be a producer ? 😊
> >
> > Do you have another idea for this subject ?
> >
> >
> > Thank you & best regards.
> >
> >
> > Adrien
> >
>


Re: Suggestion over architecture

2018-03-10 Thread Nick Vasilyev
Hard to say without more info, but why not just deploy something like a
REST api and expose it to your clients, they will send the data to the api
and it will in turn feed the Kafka  topic.

You will minimize coupling and be able to scale / upgrade easier.

On Mar 10, 2018 2:47 AM, "adrien ruffie"  wrote:

> Hello all,
>
>
> in my company we plan to set up the following architecture for our client:
>
>
> An internal kafka cluster in our company, and deploy a webapp (our
> software solution) on premise for our clients.
>
>
> We think to create one producer by "webapp" client in order to push in a
> global topic (in our kafka) an message which represent an email.
>
>
> The idea behind this, is to unload the client webapp to process several
> mass mailing operation groups, and treat them ourselves with
>
> dedicateds servers into our infrastructure. And each dedicated server will
> be a topic's consumer where the message(email) will be streamed.
>
>
> My main question is, do you think, that each client can be a producer ?
> (if we have for example 200/300 clients ?)
>
> Second question, each client should be a producer ? 😊
>
> Do you have another idea for this subject ?
>
>
> Thank you & best regards.
>
>
> Adrien
>