Review Request 35515: SAMZA-449 Expose RocksDB statistic

2015-06-16 Thread Gustavo Anatoly F . V . Solís

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35515/
---

Review request for samza.


Repository: samza


Description
---

RocksDB statistic


Diffs
-

  
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
 a423f7bd6c43461e051b5fd1f880dd01db785991 
  
samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala
 a428a16bc1e9ab4980a6f17db4fd810057d31136 

Diff: https://reviews.apache.org/r/35515/diff/


Testing
---


Thanks,

Gustavo Anatoly F. V. Solís



Re: 3 processed message per incoming message

2015-06-16 Thread Shekar Tippur
This is what I see on Yarn monitoring page:

As we can see, there are 9998 apps pending. There is some 10k limit we are
hitting. I see only 1 app running.


Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory
UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive
NodesDecommissioned NodesLost NodesUnhealthy NodesRebooted Nodes1000199981222
GB8 GB0 B2801 http://sprdargas403.corp.intuit.net:8088/cluster/nodes0
http://sprdargas403.corp.intuit.net:8088/cluster/nodes/decommissioned0
http://sprdargas403.corp.intuit.net:8088/cluster/nodes/lost0
http://sprdargas403.corp.intuit.net:8088/cluster/nodes/unhealthy0
http://sprdargas403.corp.intuit.net:8088/cluster/nodes/rebooted
Show 20406080100 entries
Search:
ID
User
Name
Application Type
Queue
StartTime
FinishTime
State
FinalStatus
Progress
Tracking UI
application_1431716639228_0120
http://sprdargas403.corp.intuit.net:8088/cluster/app/application_1431716639228_0120
rootArgos_1SamzadefaultFri, 15 May 2015 19:10:21 GMTN/ARUNNINGUNDEFINED
ApplicationMaster
http://sprdargas403.corp.intuit.net:8088/proxy/application_1431716639228_0120/


Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Yi Pan
Hi, Shekar,

This 0.9.1 is a bug-fix only release. No features added yet. New features
are expected in 0.10.0.

Thanks!

On Tue, Jun 16, 2015 at 10:59 AM, Shekar Tippur ctip...@gmail.com wrote:

 Wang,

 I have not caught up but can you please highlight if there are any feature
 additions as well?

 - Shekar

 On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com
 wrote:

  Hi all,
 
  We have been running a couple of our jobs against `0.9.1` branch last
 week
  at LinkedIn with some critical bug fixes back-ported, including:
 
  SAMZA-608
  Deserialization error causes SystemConsumers to hang
 
  SAMZA-616
  Shutdown hook does not wait for container to finish
 
  SAMZA-658
  Iterator.remove breaks caching layer
 
  SAMZA-662 / 686
  Samza auto-creates changelog stream without sufficient partitions when
  container number  1
 
  I am proposing a release vote on the current 0.9.1 branch for these bug
  fixes. Thoughts?
 
  -- Guozhang
 



Re: 3 processed message per incoming message

2015-06-16 Thread Yan Fang
Hi Shekar,

Ok. If there is only one application is running, if you kill this one, will
you still be able to see the processed messages coming? If not, I think the
code in your application maybe the cause of the problem. We can have a
further look at your code to see where the problem is.

Thanks,

Fang, Yan
yanfang...@gmail.com

On Tue, Jun 16, 2015 at 7:45 AM, Shekar Tippur ctip...@gmail.com wrote:

 This is what I see on Yarn monitoring page:

 As we can see, there are 9998 apps pending. There is some 10k limit we are
 hitting. I see only 1 app running.


 Apps SubmittedApps PendingApps RunningApps CompletedContainers
 RunningMemory
 UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive
 NodesDecommissioned NodesLost NodesUnhealthy NodesRebooted
 Nodes1000199981222
 GB8 GB0 B2801 http://sprdargas403.corp.intuit.net:8088/cluster/nodes0
 http://sprdargas403.corp.intuit.net:8088/cluster/nodes/decommissioned0
 http://sprdargas403.corp.intuit.net:8088/cluster/nodes/lost0
 http://sprdargas403.corp.intuit.net:8088/cluster/nodes/unhealthy0
 http://sprdargas403.corp.intuit.net:8088/cluster/nodes/rebooted
 Show 20406080100 entries
 Search:
 ID
 User
 Name
 Application Type
 Queue
 StartTime
 FinishTime
 State
 FinalStatus
 Progress
 Tracking UI
 application_1431716639228_0120
 
 http://sprdargas403.corp.intuit.net:8088/cluster/app/application_1431716639228_0120
 
 rootArgos_1SamzadefaultFri, 15 May 2015 19:10:21 GMTN/ARUNNINGUNDEFINED
 ApplicationMaster
 
 http://sprdargas403.corp.intuit.net:8088/proxy/application_1431716639228_0120/
 



improving hello-samza / testing

2015-06-16 Thread Tim Williams
I'm learning samza by the hello-samza project and notice the lack of
tests.  Where's a good place to learn how folks are properly testing
things written with samza?

Thanks,
--tim


Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Gustavo Anatoly
+1

Thanks.

2015-06-16 14:15 GMT-03:00 Yan Fang yanfang...@gmail.com:

 Agreed on this.

 Thanks,

 Fang, Yan
 yanfang...@gmail.com

 On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com
 wrote:

  Hi all,
 
  We have been running a couple of our jobs against `0.9.1` branch last
 week
  at LinkedIn with some critical bug fixes back-ported, including:
 
  SAMZA-608
  Deserialization error causes SystemConsumers to hang
 
  SAMZA-616
  Shutdown hook does not wait for container to finish
 
  SAMZA-658
  Iterator.remove breaks caching layer
 
  SAMZA-662 / 686
  Samza auto-creates changelog stream without sufficient partitions when
  container number  1
 
  I am proposing a release vote on the current 0.9.1 branch for these bug
  fixes. Thoughts?
 
  -- Guozhang
 



Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Chinmay Soman
+1

On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh 
nram...@linkedin.com.invalid wrote:

 +1 for the release!

 On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote:

 +1 Agreed.
 
 Thanks!
 
 On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote:
 
  Agreed on this.
 
  Thanks,
 
  Fang, Yan
  yanfang...@gmail.com
 
  On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com
  wrote:
 
   Hi all,
  
   We have been running a couple of our jobs against `0.9.1` branch last
  week
   at LinkedIn with some critical bug fixes back-ported, including:
  
   SAMZA-608
   Deserialization error causes SystemConsumers to hang
  
   SAMZA-616
   Shutdown hook does not wait for container to finish
  
   SAMZA-658
   Iterator.remove breaks caching layer
  
   SAMZA-662 / 686
   Samza auto-creates changelog stream without sufficient partitions when
   container number  1
  
   I am proposing a release vote on the current 0.9.1 branch for these
 bug
   fixes. Thoughts?
  
   -- Guozhang
  
 




-- 
Thanks and regards

Chinmay Soman


Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Guozhang Wang
Cool. I will start a voting process soon.

On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman chinmay.cere...@gmail.com
wrote:

 +1

 On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh 
 nram...@linkedin.com.invalid wrote:

  +1 for the release!
 
  On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote:
 
  +1 Agreed.
  
  Thanks!
  
  On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com
 wrote:
  
   Agreed on this.
  
   Thanks,
  
   Fang, Yan
   yanfang...@gmail.com
  
   On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com
   wrote:
  
Hi all,
   
We have been running a couple of our jobs against `0.9.1` branch
 last
   week
at LinkedIn with some critical bug fixes back-ported, including:
   
SAMZA-608
Deserialization error causes SystemConsumers to hang
   
SAMZA-616
Shutdown hook does not wait for container to finish
   
SAMZA-658
Iterator.remove breaks caching layer
   
SAMZA-662 / 686
Samza auto-creates changelog stream without sufficient partitions
 when
container number  1
   
I am proposing a release vote on the current 0.9.1 branch for these
  bug
fixes. Thoughts?
   
-- Guozhang
   
  
 
 


 --
 Thanks and regards

 Chinmay Soman




-- 
-- Guozhang


Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Yi Pan
+1 Agreed.

Thanks!

On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote:

 Agreed on this.

 Thanks,

 Fang, Yan
 yanfang...@gmail.com

 On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com
 wrote:

  Hi all,
 
  We have been running a couple of our jobs against `0.9.1` branch last
 week
  at LinkedIn with some critical bug fixes back-ported, including:
 
  SAMZA-608
  Deserialization error causes SystemConsumers to hang
 
  SAMZA-616
  Shutdown hook does not wait for container to finish
 
  SAMZA-658
  Iterator.remove breaks caching layer
 
  SAMZA-662 / 686
  Samza auto-creates changelog stream without sufficient partitions when
  container number  1
 
  I am proposing a release vote on the current 0.9.1 branch for these bug
  fixes. Thoughts?
 
  -- Guozhang
 



Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Yan Fang
Agreed on this.

Thanks,

Fang, Yan
yanfang...@gmail.com

On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote:

 Hi all,

 We have been running a couple of our jobs against `0.9.1` branch last week
 at LinkedIn with some critical bug fixes back-ported, including:

 SAMZA-608
 Deserialization error causes SystemConsumers to hang

 SAMZA-616
 Shutdown hook does not wait for container to finish

 SAMZA-658
 Iterator.remove breaks caching layer

 SAMZA-662 / 686
 Samza auto-creates changelog stream without sufficient partitions when
 container number  1

 I am proposing a release vote on the current 0.9.1 branch for these bug
 fixes. Thoughts?

 -- Guozhang



Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Shekar Tippur
Wang,

I have not caught up but can you please highlight if there are any feature
additions as well?

- Shekar

On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote:

 Hi all,

 We have been running a couple of our jobs against `0.9.1` branch last week
 at LinkedIn with some critical bug fixes back-ported, including:

 SAMZA-608
 Deserialization error causes SystemConsumers to hang

 SAMZA-616
 Shutdown hook does not wait for container to finish

 SAMZA-658
 Iterator.remove breaks caching layer

 SAMZA-662 / 686
 Samza auto-creates changelog stream without sufficient partitions when
 container number  1

 I am proposing a release vote on the current 0.9.1 branch for these bug
 fixes. Thoughts?

 -- Guozhang



Re: Review Request 34974: SAMZA-676: implement broadcast stream

2015-06-16 Thread Yan Fang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34974/
---

(Updated June 16, 2015, 9:06 p.m.)


Review request for samza.


Changes
---

rebased to latest master


Bugs: SAMZA-676
https://issues.apache.org/jira/browse/SAMZA-676


Repository: samza


Description
---

1. added offsetComparator method in SystemAdmin Interface

2. added task.global.inputs config

3. rewrote Grouper classes using Java; allows to assign global streams during 
grouping

4. used LinkedHashSet instead of HashSet in CoordinatorStreamSystemConsumer to 
preserve messages order

5. added taskNames to the offsets in OffsetManager

6. allowed to assign one SSP to multiple taskInstances

7. skipped already-processed messages in RunLoop

8. unit tests for all changes


Diffs (updated)
-

  checkstyle/import-control.xml 3374f0c 
  samza-api/src/main/java/org/apache/samza/system/SystemAdmin.java 7a588eb 
  
samza-api/src/main/java/org/apache/samza/util/SinglePartitionWithoutOffsetsSystemAdmin.java
 249b8ae 
  samza-core/src/main/java/org/apache/samza/config/TaskConfigJava.java 
PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupByPartition.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupByPartitionFactory.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartition.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartitionFactory.java
 PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala 
20e5d26 
  samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala e4b14f4 
  samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala c292ae4 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
cbacd18 
  samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala 
c5a5ea5 
  
samza-core/src/main/scala/org/apache/samza/container/grouper/stream/GroupByPartition.scala
 44e95fc 
  
samza-core/src/main/scala/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartition.scala
 3c0acad 
  
samza-core/src/main/scala/org/apache/samza/system/filereader/FileReaderSystemAdmin.scala
 097f410 
  samza-core/src/test/java/org/apache/samza/config/TestTaskConfigJava.java 
PRE-CREATION 
  
samza-core/src/test/java/org/apache/samza/container/grouper/stream/TestGroupByPartition.java
 PRE-CREATION 
  
samza-core/src/test/java/org/apache/samza/container/grouper/stream/TestGroupBySystemStreamPartition.java
 PRE-CREATION 
  samza-core/src/test/scala/org/apache/samza/checkpoint/TestOffsetManager.scala 
8d54c46 
  samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala 
64a5844 
  samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
9fb1aa9 
  samza-core/src/test/scala/org/apache/samza/container/TestTaskInstance.scala 
7caad28 
  
samza-core/src/test/scala/org/apache/samza/container/grouper/stream/GroupByTestBase.scala
 a14169b 
  
samza-core/src/test/scala/org/apache/samza/container/grouper/stream/TestGroupByPartition.scala
 74daf72 
  
samza-core/src/test/scala/org/apache/samza/container/grouper/stream/TestGroupBySystemStreamPartition.scala
 deb3895 
  
samza-core/src/test/scala/org/apache/samza/coordinator/TestJobCoordinator.scala 
d9ae187 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 
35086f5 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemConsumer.scala
 de00320 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemFactory.scala
 1629035 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemConsumer.scala
 2a84328 
  samza-test/src/main/java/org/apache/samza/system/mock/MockSystemAdmin.java 
b063366 
  
samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterTaskManager.scala
 1e936b4 

Diff: https://reviews.apache.org/r/34974/diff/


Testing
---


Thanks,

Yan Fang



[DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Guozhang Wang
Hi all,

We have been running a couple of our jobs against `0.9.1` branch last week
at LinkedIn with some critical bug fixes back-ported, including:

SAMZA-608
Deserialization error causes SystemConsumers to hang

SAMZA-616
Shutdown hook does not wait for container to finish

SAMZA-658
Iterator.remove breaks caching layer

SAMZA-662 / 686
Samza auto-creates changelog stream without sufficient partitions when
container number  1

I am proposing a release vote on the current 0.9.1 branch for these bug
fixes. Thoughts?

-- Guozhang


Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Navina Ramesh
+1 for the release!

On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote:

+1 Agreed.

Thanks!

On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote:

 Agreed on this.

 Thanks,

 Fang, Yan
 yanfang...@gmail.com

 On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com
 wrote:

  Hi all,
 
  We have been running a couple of our jobs against `0.9.1` branch last
 week
  at LinkedIn with some critical bug fixes back-ported, including:
 
  SAMZA-608
  Deserialization error causes SystemConsumers to hang
 
  SAMZA-616
  Shutdown hook does not wait for container to finish
 
  SAMZA-658
  Iterator.remove breaks caching layer
 
  SAMZA-662 / 686
  Samza auto-creates changelog stream without sufficient partitions when
  container number  1
 
  I am proposing a release vote on the current 0.9.1 branch for these
bug
  fixes. Thoughts?
 
  -- Guozhang
 




Re: Review Request 34974: SAMZA-676: implement broadcast stream

2015-06-16 Thread Yan Fang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34974/
---

(Updated June 16, 2015, 9:23 p.m.)


Review request for samza.


Changes
---

latest patch


Bugs: SAMZA-676
https://issues.apache.org/jira/browse/SAMZA-676


Repository: samza


Description
---

1. added offsetComparator method in SystemAdmin Interface

2. added task.global.inputs config

3. rewrote Grouper classes using Java; allows to assign global streams during 
grouping

4. used LinkedHashSet instead of HashSet in CoordinatorStreamSystemConsumer to 
preserve messages order

5. added taskNames to the offsets in OffsetManager

6. allowed to assign one SSP to multiple taskInstances

7. skipped already-processed messages in RunLoop

8. unit tests for all changes


Diffs (updated)
-

  checkstyle/import-control.xml 3374f0c 
  samza-api/src/main/java/org/apache/samza/system/SystemAdmin.java 7a588eb 
  
samza-api/src/main/java/org/apache/samza/util/SinglePartitionWithoutOffsetsSystemAdmin.java
 249b8ae 
  samza-core/src/main/java/org/apache/samza/config/TaskConfigJava.java 
PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupByPartition.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupByPartitionFactory.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartition.java
 PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartitionFactory.java
 PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala 
20e5d26 
  samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala e4b14f4 
  samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala c292ae4 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
cbacd18 
  samza-core/src/main/scala/org/apache/samza/container/TaskInstance.scala 
c5a5ea5 
  
samza-core/src/main/scala/org/apache/samza/container/grouper/stream/GroupByPartition.scala
 44e95fc 
  
samza-core/src/main/scala/org/apache/samza/container/grouper/stream/GroupBySystemStreamPartition.scala
 3c0acad 
  
samza-core/src/main/scala/org/apache/samza/system/filereader/FileReaderSystemAdmin.scala
 097f410 
  samza-core/src/test/java/org/apache/samza/config/TestTaskConfigJava.java 
PRE-CREATION 
  
samza-core/src/test/java/org/apache/samza/container/grouper/stream/TestGroupByPartition.java
 PRE-CREATION 
  
samza-core/src/test/java/org/apache/samza/container/grouper/stream/TestGroupBySystemStreamPartition.java
 PRE-CREATION 
  samza-core/src/test/scala/org/apache/samza/checkpoint/TestOffsetManager.scala 
8d54c46 
  samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala 
64a5844 
  samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
9fb1aa9 
  samza-core/src/test/scala/org/apache/samza/container/TestTaskInstance.scala 
7caad28 
  
samza-core/src/test/scala/org/apache/samza/container/grouper/stream/GroupByTestBase.scala
 a14169b 
  
samza-core/src/test/scala/org/apache/samza/container/grouper/stream/TestGroupByPartition.scala
 74daf72 
  
samza-core/src/test/scala/org/apache/samza/container/grouper/stream/TestGroupBySystemStreamPartition.scala
 deb3895 
  
samza-core/src/test/scala/org/apache/samza/coordinator/TestJobCoordinator.scala 
d9ae187 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemAdmin.scala 
35086f5 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemConsumer.scala
 de00320 
  
samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemFactory.scala
 1629035 
  
samza-kafka/src/test/scala/org/apache/samza/system/kafka/TestKafkaSystemConsumer.scala
 2a84328 
  samza-test/src/main/java/org/apache/samza/system/mock/MockSystemAdmin.java 
b063366 
  
samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterTaskManager.scala
 1e936b4 

Diff: https://reviews.apache.org/r/34974/diff/


Testing
---


Thanks,

Yan Fang



Re: improving hello-samza / testing

2015-06-16 Thread Chinmay Soman
We've built a driver program which kinda falls along approach (1) listed in
your email.

The driver program accepts a custom task object and has a way to inject
data - which in turn invokes the process method. For now we're assuming
logical time and use the frequency of process() invocations to deduce when
to invoke the window() method (for eg: invoke window once for every 4 calls
to process).

We've also built our own Incoming and Outgoing envelope - which is just in
the form of a Java List. This is how the result is evaluated (either get
the full result list or define a callback which is invoked every time a
collector.send is called). Its still work in progress and the goal is to
make unit testing along the lines of Storm unit tests.

On Tue, Jun 16, 2015 at 5:17 PM, Chris Riccomini criccom...@apache.org
wrote:

 Hey Tim,

 This is a really good discussion to have. The testing that I've seen with
 Samza falls into two categories:

 1. Instantiate your StreamTask, and mock all params in the process()/init()
 methods.
 2. A mini-ontegration test that starts ZooKeeper, and Kafka, and feeds
 messages into a topic, and validates it gets messages back out from the
 output topic.
 3. A full blown integration test that uses Zopkio.

 For an example of (2), in practice, have a look at TestStatefuleTask:



 https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob;f=samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala;h=ea702a919348305ff95ce0b4ca1996a13aff04ec;hb=HEAD

 As you can see, writing this kind of integration test can be a bit painful.

 (3) is documented here:

   http://samza.apache.org/contribute/tests.html

 Another way to test would be to start a full-blown container using
 ThreadJobFactory/ProcessJobFactory, but use a MockSystemFactory to mock out
 the system consumer/system producer.

 Has anyone else tested Samza in other ways?

 Cheers,
 Chris

 On Tue, Jun 16, 2015 at 11:00 AM, Tim Williams william...@gmail.com
 wrote:

  I'm learning samza by the hello-samza project and notice the lack of
  tests.  Where's a good place to learn how folks are properly testing
  things written with samza?
 
  Thanks,
  --tim
 




-- 
Thanks and regards

Chinmay Soman


Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Percy Wegmann
Thank you!

Sent using CloudMagichttps://cloudmagic.com/k/d/mailapp?ct=picv=6.0.64pv=8.2

On Tue, Jun 16, 2015 at 8:11 PM, Chris Riccomini criccom...@apache.org wrote:
+1 Here.

On Tue, Jun 16, 2015 at 12:01 PM, Guozhang Wang wangg...@gmail.com wrote:

 Cool. I will start a voting process soon.

 On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman chinmay.cere...@gmail.com
 
 wrote:

  +1
 
  On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh 
  nram...@linkedin.com.invalid wrote:
 
   +1 for the release!
  
   On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote:
  
   +1 Agreed.
   
   Thanks!
   
   On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com
  wrote:
   
Agreed on this.
   
Thanks,
   
Fang, Yan
yanfang...@gmail.com
   
On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com
 
wrote:
   
 Hi all,

 We have been running a couple of our jobs against `0.9.1` branch
  last
week
 at LinkedIn with some critical bug fixes back-ported, including:

 SAMZA-608
 Deserialization error causes SystemConsumers to hang

 SAMZA-616
 Shutdown hook does not wait for container to finish

 SAMZA-658
 Iterator.remove breaks caching layer

 SAMZA-662 / 686
 Samza auto-creates changelog stream without sufficient partitions
  when
 container number  1

 I am proposing a release vote on the current 0.9.1 branch for
 these
   bug
 fixes. Thoughts?

 -- Guozhang

   
  
  
 
 
  --
  Thanks and regards
 
  Chinmay Soman
 



 --
 -- Guozhang



Measuring Samza Job Throughput

2015-06-16 Thread Milinda Pathirage
Hi Devs,

I was looking for a way to measure Samza job throughput and found that its
possible to do it via Samza's metrics reporter. But there several types of
metrics reported via this method. For example, TaskInstanceMetrics reports
number of messages sent. But if I wanted to get a measurement like bytes
per second produced, is there a way to do that. It looks
like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number
of messages sent.

If any of you have any experience in measuring Samza job throughput, can
you please share. Really appreciate any ideas on measuring job throughput.

Thanks
Milinda
-- 
Milinda Pathirage

PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University

twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org


Re: improving hello-samza / testing

2015-06-16 Thread Luis Fernando De Pombo
https://issues.apache.org/jira/browse/SAMZA-681 tracks the first effort
towards the driver program or unit test harness for samza tasks that
Chinmay is referring to.
ᐧ

On Tue, Jun 16, 2015 at 6:11 PM, Chinmay Soman chinmay.cere...@gmail.com
wrote:

 We've built a driver program which kinda falls along approach (1) listed in
 your email.

 The driver program accepts a custom task object and has a way to inject
 data - which in turn invokes the process method. For now we're assuming
 logical time and use the frequency of process() invocations to deduce when
 to invoke the window() method (for eg: invoke window once for every 4 calls
 to process).

 We've also built our own Incoming and Outgoing envelope - which is just in
 the form of a Java List. This is how the result is evaluated (either get
 the full result list or define a callback which is invoked every time a
 collector.send is called). Its still work in progress and the goal is to
 make unit testing along the lines of Storm unit tests.

 On Tue, Jun 16, 2015 at 5:17 PM, Chris Riccomini criccom...@apache.org
 wrote:

  Hey Tim,
 
  This is a really good discussion to have. The testing that I've seen with
  Samza falls into two categories:
 
  1. Instantiate your StreamTask, and mock all params in the
 process()/init()
  methods.
  2. A mini-ontegration test that starts ZooKeeper, and Kafka, and feeds
  messages into a topic, and validates it gets messages back out from the
  output topic.
  3. A full blown integration test that uses Zopkio.
 
  For an example of (2), in practice, have a look at TestStatefuleTask:
 
 
 
 
 https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob;f=samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala;h=ea702a919348305ff95ce0b4ca1996a13aff04ec;hb=HEAD
 
  As you can see, writing this kind of integration test can be a bit
 painful.
 
  (3) is documented here:
 
http://samza.apache.org/contribute/tests.html
 
  Another way to test would be to start a full-blown container using
  ThreadJobFactory/ProcessJobFactory, but use a MockSystemFactory to mock
 out
  the system consumer/system producer.
 
  Has anyone else tested Samza in other ways?
 
  Cheers,
  Chris
 
  On Tue, Jun 16, 2015 at 11:00 AM, Tim Williams william...@gmail.com
  wrote:
 
   I'm learning samza by the hello-samza project and notice the lack of
   tests.  Where's a good place to learn how folks are properly testing
   things written with samza?
  
   Thanks,
   --tim
  
 



 --
 Thanks and regards

 Chinmay Soman



Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Shekar Tippur
+1
On Jun 16, 2015 6:39 PM, Percy Wegmann percy.wegm...@evariant.com wrote:

 Thank you!

 Sent using CloudMagic
 https://cloudmagic.com/k/d/mailapp?ct=picv=6.0.64pv=8.2

 On Tue, Jun 16, 2015 at 8:11 PM, Chris Riccomini criccom...@apache.org
 wrote:
 +1 Here.

 On Tue, Jun 16, 2015 at 12:01 PM, Guozhang Wang wangg...@gmail.com
 wrote:

  Cool. I will start a voting process soon.
 
  On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman 
 chinmay.cere...@gmail.com
  
  wrote:
 
   +1
  
   On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh 
   nram...@linkedin.com.invalid wrote:
  
+1 for the release!
   
On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote:
   
+1 Agreed.

Thanks!

On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com
   wrote:

 Agreed on this.

 Thanks,

 Fang, Yan
 yanfang...@gmail.com

 On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang 
 wangg...@gmail.com
  
 wrote:

  Hi all,
 
  We have been running a couple of our jobs against `0.9.1` branch
   last
 week
  at LinkedIn with some critical bug fixes back-ported, including:
 
  SAMZA-608
  Deserialization error causes SystemConsumers to hang
 
  SAMZA-616
  Shutdown hook does not wait for container to finish
 
  SAMZA-658
  Iterator.remove breaks caching layer
 
  SAMZA-662 / 686
  Samza auto-creates changelog stream without sufficient
 partitions
   when
  container number  1
 
  I am proposing a release vote on the current 0.9.1 branch for
  these
bug
  fixes. Thoughts?
 
  -- Guozhang
 

   
   
  
  
   --
   Thanks and regards
  
   Chinmay Soman
  
 
 
 
  --
  -- Guozhang
 



Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread sriram
+1

On Tue, Jun 16, 2015 at 7:41 PM, Shekar Tippur ctip...@gmail.com wrote:

 +1
 On Jun 16, 2015 6:39 PM, Percy Wegmann percy.wegm...@evariant.com
 wrote:

  Thank you!
 
  Sent using CloudMagic
  https://cloudmagic.com/k/d/mailapp?ct=picv=6.0.64pv=8.2
 
  On Tue, Jun 16, 2015 at 8:11 PM, Chris Riccomini criccom...@apache.org
  wrote:
  +1 Here.
 
  On Tue, Jun 16, 2015 at 12:01 PM, Guozhang Wang wangg...@gmail.com
  wrote:
 
   Cool. I will start a voting process soon.
  
   On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman 
  chinmay.cere...@gmail.com
   
   wrote:
  
+1
   
On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh 
nram...@linkedin.com.invalid wrote:
   
 +1 for the release!

 On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote:

 +1 Agreed.
 
 Thanks!
 
 On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com
wrote:
 
  Agreed on this.
 
  Thanks,
 
  Fang, Yan
  yanfang...@gmail.com
 
  On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang 
  wangg...@gmail.com
   
  wrote:
 
   Hi all,
  
   We have been running a couple of our jobs against `0.9.1`
 branch
last
  week
   at LinkedIn with some critical bug fixes back-ported,
 including:
  
   SAMZA-608
   Deserialization error causes SystemConsumers to hang
  
   SAMZA-616
   Shutdown hook does not wait for container to finish
  
   SAMZA-658
   Iterator.remove breaks caching layer
  
   SAMZA-662 / 686
   Samza auto-creates changelog stream without sufficient
  partitions
when
   container number  1
  
   I am proposing a release vote on the current 0.9.1 branch for
   these
 bug
   fixes. Thoughts?
  
   -- Guozhang
  
 


   
   
--
Thanks and regards
   
Chinmay Soman
   
  
  
  
   --
   -- Guozhang
  
 



Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Naveen Somasundaram
+1 

 On Jun 16, 2015, at 7:41 PM, Shekar Tippur ctip...@gmail.com wrote:
 
 +1
 On Jun 16, 2015 6:39 PM, Percy Wegmann percy.wegm...@evariant.com wrote:
 
 Thank you!
 
 Sent using CloudMagic
 https://cloudmagic.com/k/d/mailapp?ct=picv=6.0.64pv=8.2
 
 On Tue, Jun 16, 2015 at 8:11 PM, Chris Riccomini criccom...@apache.org
 wrote:
 +1 Here.
 
 On Tue, Jun 16, 2015 at 12:01 PM, Guozhang Wang wangg...@gmail.com
 wrote:
 
 Cool. I will start a voting process soon.
 
 On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman 
 chinmay.cere...@gmail.com
 
 wrote:
 
 +1
 
 On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh 
 nram...@linkedin.com.invalid wrote:
 
 +1 for the release!
 
 On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote:
 
 +1 Agreed.
 
 Thanks!
 
 On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com
 wrote:
 
 Agreed on this.
 
 Thanks,
 
 Fang, Yan
 yanfang...@gmail.com
 
 On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang 
 wangg...@gmail.com
 
 wrote:
 
 Hi all,
 
 We have been running a couple of our jobs against `0.9.1` branch
 last
 week
 at LinkedIn with some critical bug fixes back-ported, including:
 
 SAMZA-608
 Deserialization error causes SystemConsumers to hang
 
 SAMZA-616
 Shutdown hook does not wait for container to finish
 
 SAMZA-658
 Iterator.remove breaks caching layer
 
 SAMZA-662 / 686
 Samza auto-creates changelog stream without sufficient
 partitions
 when
 container number  1
 
 I am proposing a release vote on the current 0.9.1 branch for
 these
 bug
 fixes. Thoughts?
 
 -- Guozhang
 
 
 
 
 
 
 --
 Thanks and regards
 
 Chinmay Soman
 
 
 
 
 --
 -- Guozhang
 
 



Re: Review Request 35241: refactoring the code for coordinator stream writer

2015-06-16 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35241/#review88174
---



samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java
 (line 41)
https://reviews.apache.org/r/35241/#comment140582

Can you add a .sh wrapper using run-job to actually run the job from 
command line?
Take a look at samza-shell/src/main/bash/checkpoint-tool.sh. You can follow 
a similar pattern to run the CoordinatorStreamWriter class.



samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java
 (line 61)
https://reviews.apache.org/r/35241/#comment140581

Do we really need this check? There are no other components starting the 
same write thread.



samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java
 (line 125)
https://reviews.apache.org/r/35241/#comment140580

I was thinking about how a continuous input is more useful than a one-time 
command. I think it is safer to expose / allow writing only 1 config change at 
a time. 
This will make input validation simpler and also, avoid the job-coordinator 
to react to all config changes at the same time. 

Can you change this to input only 1 config key/value pair at a time ?



samza-core/src/test/java/org/apache/samza/coordinator/stream/MockCoordinatorStreamSystemFactory.java
 (line 60)
https://reviews.apache.org/r/35241/#comment140583

nit: spacing


- Navina Ramesh


On June 9, 2015, 11:53 p.m., Shadi A. Noghabi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/35241/
 ---
 
 (Updated June 9, 2015, 11:53 p.m.)
 
 
 Review request for samza, Yi Pan (Data Infrastructure), Navina Ramesh, and 
 Naveen Somasundaram.
 
 
 Repository: samza
 
 
 Description
 ---
 
 In order to be able to change configurations while a job is running, a tool 
 for writing a message to the coordinator stream is needed. This code targets 
 creating such a tool that can write messages to the coordinator stream after 
 the bootstrap of the job. This code is related to the SAMZA-704 JIRA.
 
 To run the code use the folowing command:
 
 path to samza deployment/bin/run-class.sh 
 org.apache.samza.coordinator.stream.CoordinatorStreamWriter 
 --config-factory=config factory --config-path=path to config file of a job
 
 
 Diffs
 -
 
   checkstyle/import-control.xml 3374f0c432e61ac4cda275377604cfd481f0cddf 
   
 samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamMessage.java
  6c1e488d00d8593d59c89b57e673e0b6b90fd7d2 
   
 samza-core/src/main/java/org/apache/samza/coordinator/stream/CoordinatorStreamWriter.java
  PRE-CREATION 
   
 samza-core/src/test/java/org/apache/samza/coordinator/stream/MockCoordinatorStreamSystemFactory.java
  647cadb3a4e51bec8204197d77ad35a6b29afcec 
   
 samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamSystemProducer.java
  68e32554c18f443565284b807f43f4a5b05afbce 
   
 samza-core/src/test/java/org/apache/samza/coordinator/stream/TestCoordinatorStreamWriter.java
  PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/35241/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Shadi A. Noghabi
 




Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Chris Riccomini
+1 Here.

On Tue, Jun 16, 2015 at 12:01 PM, Guozhang Wang wangg...@gmail.com wrote:

 Cool. I will start a voting process soon.

 On Tue, Jun 16, 2015 at 11:55 AM, Chinmay Soman chinmay.cere...@gmail.com
 
 wrote:

  +1
 
  On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh 
  nram...@linkedin.com.invalid wrote:
 
   +1 for the release!
  
   On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote:
  
   +1 Agreed.
   
   Thanks!
   
   On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com
  wrote:
   
Agreed on this.
   
Thanks,
   
Fang, Yan
yanfang...@gmail.com
   
On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com
 
wrote:
   
 Hi all,

 We have been running a couple of our jobs against `0.9.1` branch
  last
week
 at LinkedIn with some critical bug fixes back-ported, including:

 SAMZA-608
 Deserialization error causes SystemConsumers to hang

 SAMZA-616
 Shutdown hook does not wait for container to finish

 SAMZA-658
 Iterator.remove breaks caching layer

 SAMZA-662 / 686
 Samza auto-creates changelog stream without sufficient partitions
  when
 container number  1

 I am proposing a release vote on the current 0.9.1 branch for
 these
   bug
 fixes. Thoughts?

 -- Guozhang

   
  
  
 
 
  --
  Thanks and regards
 
  Chinmay Soman
 



 --
 -- Guozhang



Re: improving hello-samza / testing

2015-06-16 Thread Chris Riccomini
Hey Tim,

This is a really good discussion to have. The testing that I've seen with
Samza falls into two categories:

1. Instantiate your StreamTask, and mock all params in the process()/init()
methods.
2. A mini-ontegration test that starts ZooKeeper, and Kafka, and feeds
messages into a topic, and validates it gets messages back out from the
output topic.
3. A full blown integration test that uses Zopkio.

For an example of (2), in practice, have a look at TestStatefuleTask:


https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob;f=samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala;h=ea702a919348305ff95ce0b4ca1996a13aff04ec;hb=HEAD

As you can see, writing this kind of integration test can be a bit painful.

(3) is documented here:

  http://samza.apache.org/contribute/tests.html

Another way to test would be to start a full-blown container using
ThreadJobFactory/ProcessJobFactory, but use a MockSystemFactory to mock out
the system consumer/system producer.

Has anyone else tested Samza in other ways?

Cheers,
Chris

On Tue, Jun 16, 2015 at 11:00 AM, Tim Williams william...@gmail.com wrote:

 I'm learning samza by the hello-samza project and notice the lack of
 tests.  Where's a good place to learn how folks are properly testing
 things written with samza?

 Thanks,
 --tim



Powered by page update

2015-06-16 Thread Chris Riccomini
Hey all,

I'm seeing a lot of new faces on the mailing list, which is really awesome.
I want to invite you all to add yourselves to our Powered by page:

https://cwiki.apache.org/confluence/display/SAMZA/Powered+By

The Apache wiki is pretty locked down due to spam. If you'd like to send me
a link and short write-up, I'll be happy to add your entry to the page,
though.

Cheers,
Chris


Re: Review Request 35492: SAMZA-701 : Hello Samza - Port docker setup from hadoop-common

2015-06-16 Thread Darrell Taylor

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35492/
---

(Updated June 16, 2015, 7:36 a.m.)


Review request for samza.


Summary (updated)
-

SAMZA-701 : Hello Samza - Port docker setup from hadoop-common


Repository: samza-hello-samza


Description
---

Take the really useful docker setup from avro and hadoop-common and make it 
work for hello samza


Diffs
-

  README.md 4463454 
  conf/yarn-site.xml 9028590 
  dev-support/docker/Dockerfile PRE-CREATION 
  dev-support/docker/hadoop_env_checks.sh PRE-CREATION 
  start-env.sh PRE-CREATION 

Diff: https://reviews.apache.org/r/35492/diff/


Testing
---

* Run ./start-env.sh from the top level directory
* Followed the instructions from Start a Grid on thsi page : 
http://samza.apache.org/startup/hello-samza/0.8/


Thanks,

Darrell Taylor



Review Request 35492: Port docker setup from hadoop-common

2015-06-16 Thread Darrell Taylor

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35492/
---

Review request for samza.


Repository: samza-hello-samza


Description
---

Take the really useful docker setup from avro and hadoop-common and make it 
work for hello samza


Diffs
-

  README.md 4463454 
  conf/yarn-site.xml 9028590 
  dev-support/docker/Dockerfile PRE-CREATION 
  dev-support/docker/hadoop_env_checks.sh PRE-CREATION 
  start-env.sh PRE-CREATION 

Diff: https://reviews.apache.org/r/35492/diff/


Testing
---

* Run ./start-env.sh from the top level directory
* Followed the instructions from Start a Grid on thsi page : 
http://samza.apache.org/startup/hello-samza/0.8/


Thanks,

Darrell Taylor