Review Request 35515: SAMZA-449 Expose RocksDB statistic
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35515/ --- Review request for samza. Repository: samza Description --- RocksDB statistic Diffs - samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala a423f7bd6c43461e051b5fd1f880dd01db785991 samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala a428a16bc1e9ab4980a6f17db4fd810057d31136 Diff: https://reviews.apache.org/r/35515/diff/ Testing --- Thanks, Gustavo Anatoly F. V. Solís
Re: 3 processed message per incoming message
This is what I see on Yarn monitoring page: As we can see, there are 9998 apps pending. There is some 10k limit we are hitting. I see only 1 app running. Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive NodesDecommissioned NodesLost NodesUnhealthy NodesRebooted Nodes1000199981222 GB8 GB0 B2801 http://sprdargas403.corp.intuit.net:8088/cluster/nodes0 http://sprdargas403.corp.intuit.net:8088/cluster/nodes/decommissioned0 http://sprdargas403.corp.intuit.net:8088/cluster/nodes/lost0 http://sprdargas403.corp.intuit.net:8088/cluster/nodes/unhealthy0 http://sprdargas403.corp.intuit.net:8088/cluster/nodes/rebooted Show 20406080100 entries Search: ID User Name Application Type Queue StartTime FinishTime State FinalStatus Progress Tracking UI application_1431716639228_0120 http://sprdargas403.corp.intuit.net:8088/cluster/app/application_1431716639228_0120 rootArgos_1SamzadefaultFri, 15 May 2015 19:10:21 GMTN/ARUNNINGUNDEFINED ApplicationMaster http://sprdargas403.corp.intuit.net:8088/proxy/application_1431716639228_0120/
Re: [DISCUSS] Samza 0.9.1 release
Hi, Shekar, This 0.9.1 is a bug-fix only release. No features added yet. New features are expected in 0.10.0. Thanks! On Tue, Jun 16, 2015 at 10:59 AM, Shekar Tippur ctip...@gmail.com wrote: Wang, I have not caught up but can you please highlight if there are any feature additions as well? - Shekar On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang
Re: 3 processed message per incoming message
Hi Shekar, Ok. If there is only one application is running, if you kill this one, will you still be able to see the processed messages coming? If not, I think the code in your application maybe the cause of the problem. We can have a further look at your code to see where the problem is. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 7:45 AM, Shekar Tippur ctip...@gmail.com wrote: This is what I see on Yarn monitoring page: As we can see, there are 9998 apps pending. There is some 10k limit we are hitting. I see only 1 app running. Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive NodesDecommissioned NodesLost NodesUnhealthy NodesRebooted Nodes1000199981222 GB8 GB0 B2801 http://sprdargas403.corp.intuit.net:8088/cluster/nodes0 http://sprdargas403.corp.intuit.net:8088/cluster/nodes/decommissioned0 http://sprdargas403.corp.intuit.net:8088/cluster/nodes/lost0 http://sprdargas403.corp.intuit.net:8088/cluster/nodes/unhealthy0 http://sprdargas403.corp.intuit.net:8088/cluster/nodes/rebooted Show 20406080100 entries Search: ID User Name Application Type Queue StartTime FinishTime State FinalStatus Progress Tracking UI application_1431716639228_0120 http://sprdargas403.corp.intuit.net:8088/cluster/app/application_1431716639228_0120 rootArgos_1SamzadefaultFri, 15 May 2015 19:10:21 GMTN/ARUNNINGUNDEFINED ApplicationMaster http://sprdargas403.corp.intuit.net:8088/proxy/application_1431716639228_0120/
improving hello-samza / testing
I'm learning samza by the hello-samza project and notice the lack of tests. Where's a good place to learn how folks are properly testing things written with samza? Thanks, --tim
Re: [DISCUSS] Samza 0.9.1 release
+1 Thanks. 2015-06-16 14:15 GMT-03:00 Yan Fang yanfang...@gmail.com: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang
Re: [DISCUSS] Samza 0.9.1 release
+1 On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh nram...@linkedin.com.invalid wrote: +1 for the release! On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote: +1 Agreed. Thanks! On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang -- Thanks and regards Chinmay Soman
Re: [DISCUSS] Samza 0.9.1 release
Cool. I will start a voting process soon. On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman chinmay.cere...@gmail.com wrote: +1 On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh nram...@linkedin.com.invalid wrote: +1 for the release! On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote: +1 Agreed. Thanks! On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang -- Thanks and regards Chinmay Soman -- -- Guozhang
Re: [DISCUSS] Samza 0.9.1 release
+1 Agreed. Thanks! On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang
Re: [DISCUSS] Samza 0.9.1 release
Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang
Re: [DISCUSS] Samza 0.9.1 release
Wang, I have not caught up but can you please highlight if there are any feature additions as well? - Shekar On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang
Re: Review Request 34974: SAMZA-676: implement broadcast stream
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34974/ --- (Updated June 16, 2015, 9:06 p.m.) Review request for samza. Changes --- rebased to latest master Bugs: SAMZA-676 https://issues.apache.org/jira/browse/SAMZA-676 Repository: samza Description --- 1. added offsetComparator method in SystemAdmin Interface 2. added task.global.inputs config 3. rewrote Grouper classes using Java; allows to assign global streams during grouping 4. used LinkedHashSet instead of HashSet in CoordinatorStreamSystemConsumer to preserve messages order 5. added taskNames to the offsets in OffsetManager 6. allowed to assign one SSP to multiple taskInstances 7. skipped already-processed messages in RunLoop 8. unit tests for all changes Diffs (updated) - checkstyle/import-control.xml 3374f0c samza-api/src/main/java/org/apache/samza/system/SystemAdmin.java 7a588eb samza-api/src/main/java/org/apache/samza/util/SinglePartitionWithoutOffsetsSystemAdmin.java 249b8ae samza-core/src/main/java/org/apache/samza/config/TaskConfigJava.java PRE-CREATION samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupByPartition.java PRE-CREATION samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupByPartitionFactory.java PRE-CREATION samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartition.java PRE-CREATION samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartitionFactory.java PRE-CREATION samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala 20e5d26 samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala e4b14f4 samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala c292ae4 samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala cbacd18 samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala c5a5ea5 samza-core/src/main/scala/org/apache/samza/container/grouper/stream/GroupByPartition.scala 44e95fc samza-core/src/main/scala/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartition.scala 3c0acad samza-core/src/main/scala/org/apache/samza/system/filereader/FileReaderSystemAdmin.scala 097f410 samza-core/src/test/java/org/apache/samza/config/TestTaskConfigJava.java PRE-CREATION samza-core/src/test/java/org/apache/samza/container/grouper/stream/TestGroupByPartition.java PRE-CREATION samza-core/src/test/java/org/apache/samza/container/grouper/stream/TestGroupBySystemStreamPartition.java PRE-CREATION samza-core/src/test/scala/org/apache/samza/checkpoint/TestOffsetManager.scala 8d54c46 samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala 64a5844 samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 9fb1aa9 samza-core/src/test/scala/org/apache/samza/container/TestTaskInstance.scala 7caad28 samza-core/src/test/scala/org/apache/samza/container/grouper/stream/GroupByTestBase.scala a14169b samza-core/src/test/scala/org/apache/samza/container/grouper/stream/TestGroupByPartition.scala 74daf72 samza-core/src/test/scala/org/apache/samza/container/grouper/stream/TestGroupBySystemStreamPartition.scala deb3895 samza-core/src/test/scala/org/apache/samza/coordinator/TestJobCoordinator.scala d9ae187 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 35086f5 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemConsumer.scala de00320 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemFactory.scala 1629035 samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemConsumer.scala 2a84328 samza-test/src/main/java/org/apache/samza/system/mock/MockSystemAdmin.java b063366 samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterTaskManager.scala 1e936b4 Diff: https://reviews.apache.org/r/34974/diff/ Testing --- Thanks, Yan Fang
[DISCUSS] Samza 0.9.1 release
Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang
Re: [DISCUSS] Samza 0.9.1 release
+1 for the release! On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote: +1 Agreed. Thanks! On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang
Re: Review Request 34974: SAMZA-676: implement broadcast stream
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34974/ --- (Updated June 16, 2015, 9:23 p.m.) Review request for samza. Changes --- latest patch Bugs: SAMZA-676 https://issues.apache.org/jira/browse/SAMZA-676 Repository: samza Description --- 1. added offsetComparator method in SystemAdmin Interface 2. added task.global.inputs config 3. rewrote Grouper classes using Java; allows to assign global streams during grouping 4. used LinkedHashSet instead of HashSet in CoordinatorStreamSystemConsumer to preserve messages order 5. added taskNames to the offsets in OffsetManager 6. allowed to assign one SSP to multiple taskInstances 7. skipped already-processed messages in RunLoop 8. unit tests for all changes Diffs (updated) - checkstyle/import-control.xml 3374f0c samza-api/src/main/java/org/apache/samza/system/SystemAdmin.java 7a588eb samza-api/src/main/java/org/apache/samza/util/SinglePartitionWithoutOffsetsSystemAdmin.java 249b8ae samza-core/src/main/java/org/apache/samza/config/TaskConfigJava.java PRE-CREATION samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupByPartition.java PRE-CREATION samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupByPartitionFactory.java PRE-CREATION samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartition.java PRE-CREATION samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartitionFactory.java PRE-CREATION samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala 20e5d26 samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala e4b14f4 samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala c292ae4 samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala cbacd18 samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala c5a5ea5 samza-core/src/main/scala/org/apache/samza/container/grouper/stream/GroupByPartition.scala 44e95fc samza-core/src/main/scala/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartition.scala 3c0acad samza-core/src/main/scala/org/apache/samza/system/filereader/FileReaderSystemAdmin.scala 097f410 samza-core/src/test/java/org/apache/samza/config/TestTaskConfigJava.java PRE-CREATION samza-core/src/test/java/org/apache/samza/container/grouper/stream/TestGroupByPartition.java PRE-CREATION samza-core/src/test/java/org/apache/samza/container/grouper/stream/TestGroupBySystemStreamPartition.java PRE-CREATION samza-core/src/test/scala/org/apache/samza/checkpoint/TestOffsetManager.scala 8d54c46 samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala 64a5844 samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 9fb1aa9 samza-core/src/test/scala/org/apache/samza/container/TestTaskInstance.scala 7caad28 samza-core/src/test/scala/org/apache/samza/container/grouper/stream/GroupByTestBase.scala a14169b samza-core/src/test/scala/org/apache/samza/container/grouper/stream/TestGroupByPartition.scala 74daf72 samza-core/src/test/scala/org/apache/samza/container/grouper/stream/TestGroupBySystemStreamPartition.scala deb3895 samza-core/src/test/scala/org/apache/samza/coordinator/TestJobCoordinator.scala d9ae187 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 35086f5 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemConsumer.scala de00320 samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemFactory.scala 1629035 samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemConsumer.scala 2a84328 samza-test/src/main/java/org/apache/samza/system/mock/MockSystemAdmin.java b063366 samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterTaskManager.scala 1e936b4 Diff: https://reviews.apache.org/r/34974/diff/ Testing --- Thanks, Yan Fang
Re: improving hello-samza / testing
We've built a driver program which kinda falls along approach (1) listed in your email. The driver program accepts a custom task object and has a way to inject data - which in turn invokes the process method. For now we're assuming logical time and use the frequency of process() invocations to deduce when to invoke the window() method (for eg: invoke window once for every 4 calls to process). We've also built our own Incoming and Outgoing envelope - which is just in the form of a Java List. This is how the result is evaluated (either get the full result list or define a callback which is invoked every time a collector.send is called). Its still work in progress and the goal is to make unit testing along the lines of Storm unit tests. On Tue, Jun 16, 2015 at 5:17 PM, Chris Riccomini criccom...@apache.org wrote: Hey Tim, This is a really good discussion to have. The testing that I've seen with Samza falls into two categories: 1. Instantiate your StreamTask, and mock all params in the process()/init() methods. 2. A mini-ontegration test that starts ZooKeeper, and Kafka, and feeds messages into a topic, and validates it gets messages back out from the output topic. 3. A full blown integration test that uses Zopkio. For an example of (2), in practice, have a look at TestStatefuleTask: https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob;f=samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala;h=ea702a919348305ff95ce0b4ca1996a13aff04ec;hb=HEAD As you can see, writing this kind of integration test can be a bit painful. (3) is documented here: http://samza.apache.org/contribute/tests.html Another way to test would be to start a full-blown container using ThreadJobFactory/ProcessJobFactory, but use a MockSystemFactory to mock out the system consumer/system producer. Has anyone else tested Samza in other ways? Cheers, Chris On Tue, Jun 16, 2015 at 11:00 AM, Tim Williams william...@gmail.com wrote: I'm learning samza by the hello-samza project and notice the lack of tests. Where's a good place to learn how folks are properly testing things written with samza? Thanks, --tim -- Thanks and regards Chinmay Soman
Re: [DISCUSS] Samza 0.9.1 release
Thank you! Sent using CloudMagichttps://cloudmagic.com/k/d/mailapp?ct=picv=6.0.64pv=8.2 On Tue, Jun 16, 2015 at 8:11 PM, Chris Riccomini criccom...@apache.org wrote: +1 Here. On Tue, Jun 16, 2015 at 12:01 PM, Guozhang Wang wangg...@gmail.com wrote: Cool. I will start a voting process soon. On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman chinmay.cere...@gmail.com wrote: +1 On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh nram...@linkedin.com.invalid wrote: +1 for the release! On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote: +1 Agreed. Thanks! On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang -- Thanks and regards Chinmay Soman -- -- Guozhang
Measuring Samza Job Throughput
Hi Devs, I was looking for a way to measure Samza job throughput and found that its possible to do it via Samza's metrics reporter. But there several types of metrics reported via this method. For example, TaskInstanceMetrics reports number of messages sent. But if I wanted to get a measurement like bytes per second produced, is there a way to do that. It looks like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number of messages sent. If any of you have any experience in measuring Samza job throughput, can you please share. Really appreciate any ideas on measuring job throughput. Thanks Milinda -- Milinda Pathirage PhD Student | Research Assistant School of Informatics and Computing | Data to Insight Center Indiana University twitter: milindalakmal skype: milinda.pathirage blog: http://milinda.pathirage.org
Re: improving hello-samza / testing
https://issues.apache.org/jira/browse/SAMZA-681 tracks the first effort towards the driver program or unit test harness for samza tasks that Chinmay is referring to. ᐧ On Tue, Jun 16, 2015 at 6:11 PM, Chinmay Soman chinmay.cere...@gmail.com wrote: We've built a driver program which kinda falls along approach (1) listed in your email. The driver program accepts a custom task object and has a way to inject data - which in turn invokes the process method. For now we're assuming logical time and use the frequency of process() invocations to deduce when to invoke the window() method (for eg: invoke window once for every 4 calls to process). We've also built our own Incoming and Outgoing envelope - which is just in the form of a Java List. This is how the result is evaluated (either get the full result list or define a callback which is invoked every time a collector.send is called). Its still work in progress and the goal is to make unit testing along the lines of Storm unit tests. On Tue, Jun 16, 2015 at 5:17 PM, Chris Riccomini criccom...@apache.org wrote: Hey Tim, This is a really good discussion to have. The testing that I've seen with Samza falls into two categories: 1. Instantiate your StreamTask, and mock all params in the process()/init() methods. 2. A mini-ontegration test that starts ZooKeeper, and Kafka, and feeds messages into a topic, and validates it gets messages back out from the output topic. 3. A full blown integration test that uses Zopkio. For an example of (2), in practice, have a look at TestStatefuleTask: https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob;f=samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala;h=ea702a919348305ff95ce0b4ca1996a13aff04ec;hb=HEAD As you can see, writing this kind of integration test can be a bit painful. (3) is documented here: http://samza.apache.org/contribute/tests.html Another way to test would be to start a full-blown container using ThreadJobFactory/ProcessJobFactory, but use a MockSystemFactory to mock out the system consumer/system producer. Has anyone else tested Samza in other ways? Cheers, Chris On Tue, Jun 16, 2015 at 11:00 AM, Tim Williams william...@gmail.com wrote: I'm learning samza by the hello-samza project and notice the lack of tests. Where's a good place to learn how folks are properly testing things written with samza? Thanks, --tim -- Thanks and regards Chinmay Soman
Re: [DISCUSS] Samza 0.9.1 release
+1 On Jun 16, 2015 6:39 PM, Percy Wegmann percy.wegm...@evariant.com wrote: Thank you! Sent using CloudMagic https://cloudmagic.com/k/d/mailapp?ct=picv=6.0.64pv=8.2 On Tue, Jun 16, 2015 at 8:11 PM, Chris Riccomini criccom...@apache.org wrote: +1 Here. On Tue, Jun 16, 2015 at 12:01 PM, Guozhang Wang wangg...@gmail.com wrote: Cool. I will start a voting process soon. On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman chinmay.cere...@gmail.com wrote: +1 On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh nram...@linkedin.com.invalid wrote: +1 for the release! On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote: +1 Agreed. Thanks! On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang -- Thanks and regards Chinmay Soman -- -- Guozhang
Re: [DISCUSS] Samza 0.9.1 release
+1 On Tue, Jun 16, 2015 at 7:41 PM, Shekar Tippur ctip...@gmail.com wrote: +1 On Jun 16, 2015 6:39 PM, Percy Wegmann percy.wegm...@evariant.com wrote: Thank you! Sent using CloudMagic https://cloudmagic.com/k/d/mailapp?ct=picv=6.0.64pv=8.2 On Tue, Jun 16, 2015 at 8:11 PM, Chris Riccomini criccom...@apache.org wrote: +1 Here. On Tue, Jun 16, 2015 at 12:01 PM, Guozhang Wang wangg...@gmail.com wrote: Cool. I will start a voting process soon. On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman chinmay.cere...@gmail.com wrote: +1 On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh nram...@linkedin.com.invalid wrote: +1 for the release! On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote: +1 Agreed. Thanks! On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang -- Thanks and regards Chinmay Soman -- -- Guozhang
Re: [DISCUSS] Samza 0.9.1 release
+1 On Jun 16, 2015, at 7:41 PM, Shekar Tippur ctip...@gmail.com wrote: +1 On Jun 16, 2015 6:39 PM, Percy Wegmann percy.wegm...@evariant.com wrote: Thank you! Sent using CloudMagic https://cloudmagic.com/k/d/mailapp?ct=picv=6.0.64pv=8.2 On Tue, Jun 16, 2015 at 8:11 PM, Chris Riccomini criccom...@apache.org wrote: +1 Here. On Tue, Jun 16, 2015 at 12:01 PM, Guozhang Wang wangg...@gmail.com wrote: Cool. I will start a voting process soon. On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman chinmay.cere...@gmail.com wrote: +1 On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh nram...@linkedin.com.invalid wrote: +1 for the release! On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote: +1 Agreed. Thanks! On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang -- Thanks and regards Chinmay Soman -- -- Guozhang
Re: Review Request 35241: refactoring the code for coordinator stream writer
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35241/#review88174 --- samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java (line 41) https://reviews.apache.org/r/35241/#comment140582 Can you add a .sh wrapper using run-job to actually run the job from command line? Take a look at samza-shell/src/main/bash/checkpoint-tool.sh. You can follow a similar pattern to run the CoordinatorStreamWriter class. samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java (line 61) https://reviews.apache.org/r/35241/#comment140581 Do we really need this check? There are no other components starting the same write thread. samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java (line 125) https://reviews.apache.org/r/35241/#comment140580 I was thinking about how a continuous input is more useful than a one-time command. I think it is safer to expose / allow writing only 1 config change at a time. This will make input validation simpler and also, avoid the job-coordinator to react to all config changes at the same time. Can you change this to input only 1 config key/value pair at a time ? samza-core/src/test/java/org/apache/samza/coordinator/stream/MockCoordinatorStreamSystemFactory.java (line 60) https://reviews.apache.org/r/35241/#comment140583 nit: spacing - Navina Ramesh On June 9, 2015, 11:53 p.m., Shadi A. Noghabi wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35241/ --- (Updated June 9, 2015, 11:53 p.m.) Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and Naveen Somasundaram. Repository: samza Description --- In order to be able to change configurations while a job is running, a tool for writing a message to the coordinator stream is needed. This code targets creating such a tool that can write messages to the coordinator stream after the bootstrap of the job. This code is related to the SAMZA-704 JIRA. To run the code use the folowing command: path to samza deployment/bin/run-class.sh org.apache.samza.coordinator.stream.CoordinatorStreamWriter --config-factory=config factory --config-path=path to config file of a job Diffs - checkstyle/import-control.xml 3374f0c432e61ac4cda275377604cfd481f0cddf samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java 6c1e488d00d8593d59c89b57e673e0b6b90fd7d2 samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java PRE-CREATION samza-core/src/test/java/org/apache/samza/coordinator/stream/MockCoordinatorStreamSystemFactory.java 647cadb3a4e51bec8204197d77ad35a6b29afcec samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamSystemProducer.java 68e32554c18f443565284b807f43f4a5b05afbce samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamWriter.java PRE-CREATION Diff: https://reviews.apache.org/r/35241/diff/ Testing --- Thanks, Shadi A. Noghabi
Re: [DISCUSS] Samza 0.9.1 release
+1 Here. On Tue, Jun 16, 2015 at 12:01 PM, Guozhang Wang wangg...@gmail.com wrote: Cool. I will start a voting process soon. On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman chinmay.cere...@gmail.com wrote: +1 On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh nram...@linkedin.com.invalid wrote: +1 for the release! On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote: +1 Agreed. Thanks! On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang -- Thanks and regards Chinmay Soman -- -- Guozhang
Re: improving hello-samza / testing
Hey Tim, This is a really good discussion to have. The testing that I've seen with Samza falls into two categories: 1. Instantiate your StreamTask, and mock all params in the process()/init() methods. 2. A mini-ontegration test that starts ZooKeeper, and Kafka, and feeds messages into a topic, and validates it gets messages back out from the output topic. 3. A full blown integration test that uses Zopkio. For an example of (2), in practice, have a look at TestStatefuleTask: https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob;f=samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala;h=ea702a919348305ff95ce0b4ca1996a13aff04ec;hb=HEAD As you can see, writing this kind of integration test can be a bit painful. (3) is documented here: http://samza.apache.org/contribute/tests.html Another way to test would be to start a full-blown container using ThreadJobFactory/ProcessJobFactory, but use a MockSystemFactory to mock out the system consumer/system producer. Has anyone else tested Samza in other ways? Cheers, Chris On Tue, Jun 16, 2015 at 11:00 AM, Tim Williams william...@gmail.com wrote: I'm learning samza by the hello-samza project and notice the lack of tests. Where's a good place to learn how folks are properly testing things written with samza? Thanks, --tim
Powered by page update
Hey all, I'm seeing a lot of new faces on the mailing list, which is really awesome. I want to invite you all to add yourselves to our Powered by page: https://cwiki.apache.org/confluence/display/SAMZA/Powered+By The Apache wiki is pretty locked down due to spam. If you'd like to send me a link and short write-up, I'll be happy to add your entry to the page, though. Cheers, Chris
Re: Review Request 35492: SAMZA-701 : Hello Samza - Port docker setup from hadoop-common
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35492/ --- (Updated June 16, 2015, 7:36 a.m.) Review request for samza. Summary (updated) - SAMZA-701 : Hello Samza - Port docker setup from hadoop-common Repository: samza-hello-samza Description --- Take the really useful docker setup from avro and hadoop-common and make it work for hello samza Diffs - README.md 4463454 conf/yarn-site.xml 9028590 dev-support/docker/Dockerfile PRE-CREATION dev-support/docker/hadoop_env_checks.sh PRE-CREATION start-env.sh PRE-CREATION Diff: https://reviews.apache.org/r/35492/diff/ Testing --- * Run ./start-env.sh from the top level directory * Followed the instructions from Start a Grid on thsi page : http://samza.apache.org/startup/hello-samza/0.8/ Thanks, Darrell Taylor
Review Request 35492: Port docker setup from hadoop-common
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/35492/ --- Review request for samza. Repository: samza-hello-samza Description --- Take the really useful docker setup from avro and hadoop-common and make it work for hello samza Diffs - README.md 4463454 conf/yarn-site.xml 9028590 dev-support/docker/Dockerfile PRE-CREATION dev-support/docker/hadoop_env_checks.sh PRE-CREATION start-env.sh PRE-CREATION Diff: https://reviews.apache.org/r/35492/diff/ Testing --- * Run ./start-env.sh from the top level directory * Followed the instructions from Start a Grid on thsi page : http://samza.apache.org/startup/hello-samza/0.8/ Thanks, Darrell Taylor