Re: Review Request 49212: RFC: SAMZA-855: Update kafka client to 0.10.0.0

2016-07-05 Thread Robert Crim


> On June 29, 2016, 2:16 a.m., Navina Ramesh wrote:
> > Robert,
> > Are you expecting more changes for 0.10.0 upgrade? Just checking on your 
> > progress. Thanks!

Yes, this just updates the client version but does not use the 0.9+ producer 
and consumer.


- Robert


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49212/#review139920
---


On June 24, 2016, 7:45 p.m., Robert Crim wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49212/
> ---
> 
> (Updated June 24, 2016, 7:45 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> This is a WIP for updating the the kafka client libraries to 0.10+. So far, 
> I've updated the dependency and simply worked to get all existing tests 
> passing. The next steps are to further test/verify backwards compatiblity 
> with older brokers and moving the current `KafkaSystemFactory`, etc, to 
> `OldKafkaSystemFactory` and implementing the new clients.
> 
> 
> Diffs
> -
> 
>   build.gradle ba4a9d1 
>   gradle/dependency-versions.gradle 47c71bf 
>   
> samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
>  ea10cae 
>   
> samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManagerFactory.scala
>  4e97376 
>   
> samza-kafka/src/main/scala/org/apache/samza/config/RegExTopicGenerator.scala 
> 78467bf 
>   
> samza-kafka/src/main/scala/org/apache/samza/migration/KafkaCheckpointMigration.scala
>  5e8cc65 
>   
> samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala
>  ba8de5c 
>   
> samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemConsumer.scala
>  b373753 
>   
> samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemFactory.scala
>  b574176 
>   samza-kafka/src/main/scala/org/apache/samza/util/KafkaUtil.scala a25ba62 
>   
> samza-kafka/src/test/java/org/apache/samza/system/kafka/MockKafkaProducer.java
>  6f498de 
>   samza-kafka/src/test/java/org/apache/samza/utils/TestUtils.java 2fa743f 
>   
> samza-kafka/src/test/scala/org/apache/samza/checkpoint/kafka/TestKafkaCheckpointManager.scala
>  e6815da 
>   
> samza-kafka/src/test/scala/org/apache/samza/migration/TestKafkaCheckpointMigration.scala
>  504fc89 
>   
> samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala
>  f00405d 
>   
> samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemConsumer.scala
>  ece0359 
>   
> samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemProducer.scala
>  8e32bba 
>   
> samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTestUtil.scala
>  8d7e3fe 
> 
> Diff: https://reviews.apache.org/r/49212/diff/
> 
> 
> Testing
> ---
> 
> Got `./gradlew clean check` passing. I've not been able to run the 
> integration tests (on any branch) but will do that next!
> 
> 
> Thanks,
> 
> Robert Crim
> 
>



Review Request 49212: RFC: SAMZA-855: Update kafka client to 0.10.0.0

2016-06-24 Thread Robert Crim

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49212/
---

Review request for samza.


Repository: samza


Description
---

This is a WIP for updating the the kafka client libraries to 0.10+. So far, 
I've updated the dependency and simply worked to get all existing tests 
passing. The next steps are to further test/verify backwards compatiblity with 
older brokers and moving the current `KafkaSystemFactory`, etc, to 
`OldKafkaSystemFactory` and implementing the new clients.


Diffs
-

  build.gradle ba4a9d1 
  gradle/dependency-versions.gradle 47c71bf 
  
samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
 ea10cae 
  
samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManagerFactory.scala
 4e97376 
  samza-kafka/src/main/scala/org/apache/samza/config/RegExTopicGenerator.scala 
78467bf 
  
samza-kafka/src/main/scala/org/apache/samza/migration/KafkaCheckpointMigration.scala
 5e8cc65 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 
ba8de5c 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemConsumer.scala
 b373753 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemFactory.scala
 b574176 
  samza-kafka/src/main/scala/org/apache/samza/util/KafkaUtil.scala a25ba62 
  
samza-kafka/src/test/java/org/apache/samza/system/kafka/MockKafkaProducer.java 
6f498de 
  samza-kafka/src/test/java/org/apache/samza/utils/TestUtils.java 2fa743f 
  
samza-kafka/src/test/scala/org/apache/samza/checkpoint/kafka/TestKafkaCheckpointManager.scala
 e6815da 
  
samza-kafka/src/test/scala/org/apache/samza/migration/TestKafkaCheckpointMigration.scala
 504fc89 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemAdmin.scala
 f00405d 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemConsumer.scala
 ece0359 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemProducer.scala
 8e32bba 
  
samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTestUtil.scala
 8d7e3fe 

Diff: https://reviews.apache.org/r/49212/diff/


Testing
---

Got `./gradlew clean check` passing. I've not been able to run the integration 
tests (on any branch) but will do that next!


Thanks,

Robert Crim



Re: Kafka dependency

2016-05-10 Thread Robert Crim
Just want to chime in to add we'd like to use some of the new security
features in Kafka 0.9, ACLs and TLS in particular.

On Tue, May 10, 2016 at 11:24 AM, Nick Quinn  wrote:

> Thanks Yi!
>
> One more question, if I may:  I saw that you dropped a 0.10.0 in December
> of 2015, but the version in Github is still 0.10.0-rc. Do you guys plan on
> an official release of 0.10.0 or are you settling for a simple rc release?
> The reason I ask is that it is hard to match the git commits to the release
> when there is no official 0.10.0 release.
>
> Thanks!
> Nick
>
>
> -Original Message-
> From: Yi Pan [mailto:nickpa...@gmail.com]
> Sent: Tuesday, May 10, 2016 11:22 AM
> To: dev@samza.apache.org
> Subject: Re: Kafka dependency
>
> Hi, Nick,
>
> We do have plan to update the Kafka dependency in Samza. However, Samza
> only uses Kafka client library. We have confirmed that any Kafka 0.8.2
> clients should be supported by Kafka 0.9 brokers. Hence, it should not
> block you if you are thinking of upgrading Kafka broker versions (e.g.
> LinkedIn has been running with this combination for a long time). The main
> reason that we are taking a bit careful path on the upgrading of Kafka
> client version is that there are Samza users who are still running Kafka
> 0.8.2 broker version and if we force upgrade the client version to 0.9, it
> would likely not be supported by a lower version of Kafka broker.
>
> If you have any specific use case that requires Kafka client version 0.9
> and above, please speak up. We will put into consideration in our upgrade
> plan and timeline.
>
> Thanks a lot!
>
> -Yi
>
> On Tue, May 10, 2016 at 10:28 AM, Nick Quinn 
> wrote:
>
> > Hi guys-
> >
> >
> >
> > I was wondering why Samza still has a dependency on Kafka 0.8.2. Does
> > your development team have any plans to update the Kafka dependency
> > version that Samza is using?
> >
> > Best,
> >
> > Nick
> >
> >
>


Re: Exactly once processing

2016-04-15 Thread Robert Crim
Looking at:
https://github.com/apache/samza/blob/f02386464d31b5a496bb0578838f51a0331bfffa/samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala#L171


The commit function, in order, does:
1. Flushes metrics
2. Flushes stores
3. Produces messages from the collectors
4. Write offsets

So I would reason that it would be OK to store an offset you've seen in the
store and use that to skip the messages if you've already mutated your data
-- but be aware any of 2 (if multiple stores) ,3, or 4 may not have
happened so you might want to do those again. You'd need to be careful if
your changes span multiple stores or keys since multiple writes to
changelogs are not atomic.

Question to maintainers: is it safe for Samza users to relay on this order?

On Fri, Apr 15, 2016 at 11:31 AM, Sabarish Sasidharan <
sabarish@gmail.com> wrote:

> Hi Guozhang
>
> Thanks. Assuming the checkpoint would typically be behind the offset
> persisted in my store (+ changelog), when the messages are replayed
> starting from the checkpoint, I can very well skip those by comparing
> against the offset in my store right? So I am not understanding why
> duplicates would affect my state.
>
> Regards
> Sab
>
> On Fri, Apr 15, 2016 at 10:07 PM, Guozhang Wang 
> wrote:
>
> > Hi Sab,
> >
> > For stateful processing where you have persistent state stores, you need
> to
> > maintain the checkpoint which includes the committed offsets as well as
> the
> > store flushed in sync, but right not these two operations are not done
> > atomically, and hence if you fail in between, you could still get
> > duplicates where you consume from the committed offsets while some of
> them
> > have already updated the stores.
> >
> > Guozhang
> >
> >
> > On Thu, Apr 14, 2016 at 11:56 PM, Sasidharan, Sabarish <
> > sabarish.sasidha...@harman.com> wrote:
> >
> > > Hi
> > >
> > > To achieve exactly once processing for my aggregates, wouldn’t it be
> > > enough if I maintain the latest offset processed for the aggregate and
> > > check against that offset when messages are replayed on recovery? Am I
> > > missing something here?
> > >
> > > Thanks
> > >
> > > Regards
> > > Sab
> >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: ThreadJobFactory in production

2016-03-02 Thread Robert Crim
Thanks for the clarification -- very helpful.

I'll take a look at those tickets!


On Wed, Mar 2, 2016 at 2:11 PM, Yi Pan  wrote:

> Hi, Robert,
>
> The main reason that ThreadJobFactory and ProcessJobFactory are not
> considered "production-ready" is that there is only one container for the
> job and all tasks are assigned to the single container. Hence, it is not
> easy to scale out of a single host.
>
> As Rick mentioned, Netflix has put up a patch in SAMZA-41 based on 0.9.1 o
> allow static assignment of a subset of partitions to a single ProcessJob,
> which allows to launch multiple ProcessJobs in different hosts. We planned
> to merge it to 0.10. But it turns out that too much changes have gone into
> 0.10 and it became difficult to merge the patch. At this point, we can
> still try the following two options:
> 1) We can attempt to merge SAMZA-41 to 0.10.1 again, it may take some
> effort but would give a stop-gap solution.
> 2) We are working on a standalone Samza model (SAMZA-516, SAMZA-881) to
> allow users to run Samza w/o depending on YarnJobFactory. This is a
> long-term effort and will take some time to flesh out. Please join the
> discussion there s.t. we can be more aligned in our effort.
>
> Hope the above gives you an overall picture on where we are going.
>
> Thanks a lot!
>
> -Yi
>
> On Wed, Mar 2, 2016 at 1:28 PM, Rick Mangi  wrote:
>
> > There was an interesting thread a while back from I believe the netflix
> > guys about running ThreadJobFactory in production.
> >
> >
> > > On Mar 2, 2016, at 4:20 PM, Robert Crim  wrote:
> > >
> > > Hi,
> > >
> > > We're currently working on a solution that allows us to run Samza jobs
> on
> > > Mesos. This seems to be going well, and something we'd like to move
> away
> > > from when native Mesos support is added to Samza.
> > >
> > > While we're developing and testing our scheduler, I'm wondering about
> the
> > > implications of running tasks with the ThreadJobFactory in
> "production".
> > > The documentation advise against this, but it's not clear why.
> > >
> > > If we were using the ThreadJobFactory inside of a docker container on
> > Mesos
> > > with Marathon for production, would be our main problem? These are not
> > > particularly high-load tasks. Aside from not be able to get
> find-grained
> > > resource scheduling per-task, it seems like the main issue the not
> being
> > to
> > > easily tell when a job stops due to error / exception.
> > >
> > > In other words, what would be stop-stopping reasons to not use the
> > > TreadJobFactory in production?
> > >
> > > Thanks,
> > > Rob
> >
> >
>


ThreadJobFactory in production

2016-03-02 Thread Robert Crim
Hi,

We're currently working on a solution that allows us to run Samza jobs on
Mesos. This seems to be going well, and something we'd like to move away
from when native Mesos support is added to Samza.

While we're developing and testing our scheduler, I'm wondering about the
implications of running tasks with the ThreadJobFactory in "production".
The documentation advise against this, but it's not clear why.

If we were using the ThreadJobFactory inside of a docker container on Mesos
with Marathon for production, would be our main problem? These are not
particularly high-load tasks. Aside from not be able to get find-grained
resource scheduling per-task, it seems like the main issue the not being to
easily tell when a job stops due to error / exception.

In other words, what would be stop-stopping reasons to not use the
TreadJobFactory in production?

Thanks,
Rob