Re: Review Request 53282: SAMZA-1043: Samza performance improvements

2016-11-08 Thread Xinyu Liu


> On Nov. 2, 2016, 7:01 p.m., Chris Pettitt wrote:
> > samza-core/src/main/java/org/apache/samza/util/TimerClock.java, line 25
> > 
> >
> > We have a HighResolutionClock that does this. I think you can use it 
> > here.
> 
> Xinyu Liu wrote:
> I was thinking about using it at first. But the HighResolutionClock has 
> two functions, nanoTime() and sleep(). The second method does not fit here 
> for my usage. The other benefit of using a single method interface here is 
> that I can do lambda function, which makes code nicer to read :).
> 
> Or I can remove the nanoTime() interface from HighResolutionClock and let 
> HighResolutionClock extends from TimerClock. Does this sound better to you?
> 
> Chris Pettitt wrote:
> I would probably pull sleep off of HighResolutionClock. It looks like it 
> may only be used in one place now which means it could just be inlined. 
> Otherwise it could be added to a util class.

Good suggestion. I inline the sleep inside the ThrottlingExcutor and now we 
only have a single HighResolutionClock class.


- Xinyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53282/#review154611
---


On Nov. 9, 2016, 2:14 a.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53282/
> ---
> 
> (Updated Nov. 9, 2016, 2:14 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Jake Maes, and Navina Ramesh.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> In the recent experiments of samza batch job (consuming hdfs data on hadoop), 
> the results are subpar to map/reduce and spark. By looking at the metrics 
> closely, we found two basic problems:
> 
> 1) Not enough data to process. This is spotted as the unprocessed message 
> queue length was zero for quite a lot of times.
> 
> 2) Not process fast enough. We found samza performed closely in both median 
> size records (100B) and small record (10B), while spark can scale very well 
> in the small record (over 1M/s).
> 
> The first problem is solved by increasing the buffer size. This ticket is to 
> address the second problem, which contains three major improvements:
> 
> - Option to turn off timer metrics calculation: one of the main time spent in 
> samza processing turns out to be just keeping the timer metrics. While it is 
> useful in debugging, it becomes a bottleneck when running a stable job with 
> high performance. In my testing job which consumes 8M mock data, it took 30 
> secs with timer metrics on. After turning it off, it only took 14 secs.
> 
> - Java coding improvements: The AsyncRunLoop code can be further optimized 
> for efficiency. Some of the thread-safe data structure I am using is not for 
> optimal performance (Collections.synchronizedSet). I switched to use 
> CopyOnWriteArraySet, which has far better performance due to more reads and 
> small set size.
> 
> - Specific handling for in-order processing improvements: AsyncRunLoop 
> handles the callbacks regardless of whether it's in-order or out-of-order 
> (max concurrency > 1), which incurs quite some cost. By simplying the logic 
> for in-order handling, the performance gains.
> 
> After all three improvements, my test job with mock input (8M messages) can 
> be processed within 8 sec (down from org 30 secs), so it's 1M/s for one cpu 
> core.
> 
> For the performance benchmark jobs running in Hadoop, we also see a 4 times 
> improvement with all the fixes above. Please take a look at the attached 
> spreedsheet (see the numbers with fix(turn off the timing metrics) and 
> fix2(all three together).
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/java/org/apache/samza/container/RunLoopFactory.java 
> 609a956a1f2fa97419c2f66fe2fb6876aaaeecd0 
>   samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java 
> 8fac8155c7f64e67d4a39ec6943f98da1e1d63d9 
>   samza-core/src/main/java/org/apache/samza/task/CoordinatorRequests.java 
> 052b3b91ec609ca6288662cfa2d3e71b0273d020 
>   samza-core/src/main/java/org/apache/samza/task/TaskCallbackImpl.java 
> 9b700998d2af040c6734289f7f28bbd78c36bd2c 
>   samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java 
> 132cf59eb593524a4cac134aeceeeb37a4c74b1f 
>   samza-core/src/main/java/org/apache/samza/util/HighResolutionClock.java 
> 69ba441ed087305dfe4e1272b00fad67b644e13f 
>   
> samza-core/src/main/java/org/apache/samza/util/SystemHighResolutionClock.java 
> 2e65b603bc959273e679eba3ea9f89c7c0c4664d 
>   samza-core/src/main/java/org/apache/samza/util/ThrottlingExecutor.java 
> d1298fc40680e5ad4db7067c9ef02f0266dffc1d 
>   

Re: Review Request 53282: SAMZA-1043: Samza performance improvements

2016-11-08 Thread Xinyu Liu


> On Nov. 2, 2016, 7:57 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala, 
> > line 65
> > 
> >
> > I don't think we should ever be creating a default new SerdeManager 
> > just for this class. Same for SystemConsumerMetrics (and other places where 
> > this pattern is used for objects). Constants are fine.
> 
> Xinyu Liu wrote:
> I think these are created as default for the testing purpose. Normally we 
> will pass in the real SerdeManager.
> 
> Prateek Maheshwari wrote:
> I'd still argue that we shouldn't do this.
> 1. It makes it possible to accidentally forget passing objects which are 
> actually necessary for this class to be functional.
> 2. Looking at the signature of these constructors, there's no indication 
> whether the field is really required or optional.
> 2. We rely on this pattern in other places for automatically creating 
> objects. It makes looking up what classes are being used where more difficult.
> 
> The only benefit is saving a few characters when instantiating in tests. 
> Would strongly prefer always passing objects explicitly.

Talked offline. We are not going to refactor this legacy stuff for now.


- Xinyu


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53282/#review154602
---


On Nov. 9, 2016, 2:14 a.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53282/
> ---
> 
> (Updated Nov. 9, 2016, 2:14 a.m.)
> 
> 
> Review request for samza, Chris Pettitt, Jake Maes, and Navina Ramesh.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> In the recent experiments of samza batch job (consuming hdfs data on hadoop), 
> the results are subpar to map/reduce and spark. By looking at the metrics 
> closely, we found two basic problems:
> 
> 1) Not enough data to process. This is spotted as the unprocessed message 
> queue length was zero for quite a lot of times.
> 
> 2) Not process fast enough. We found samza performed closely in both median 
> size records (100B) and small record (10B), while spark can scale very well 
> in the small record (over 1M/s).
> 
> The first problem is solved by increasing the buffer size. This ticket is to 
> address the second problem, which contains three major improvements:
> 
> - Option to turn off timer metrics calculation: one of the main time spent in 
> samza processing turns out to be just keeping the timer metrics. While it is 
> useful in debugging, it becomes a bottleneck when running a stable job with 
> high performance. In my testing job which consumes 8M mock data, it took 30 
> secs with timer metrics on. After turning it off, it only took 14 secs.
> 
> - Java coding improvements: The AsyncRunLoop code can be further optimized 
> for efficiency. Some of the thread-safe data structure I am using is not for 
> optimal performance (Collections.synchronizedSet). I switched to use 
> CopyOnWriteArraySet, which has far better performance due to more reads and 
> small set size.
> 
> - Specific handling for in-order processing improvements: AsyncRunLoop 
> handles the callbacks regardless of whether it's in-order or out-of-order 
> (max concurrency > 1), which incurs quite some cost. By simplying the logic 
> for in-order handling, the performance gains.
> 
> After all three improvements, my test job with mock input (8M messages) can 
> be processed within 8 sec (down from org 30 secs), so it's 1M/s for one cpu 
> core.
> 
> For the performance benchmark jobs running in Hadoop, we also see a 4 times 
> improvement with all the fixes above. Please take a look at the attached 
> spreedsheet (see the numbers with fix(turn off the timing metrics) and 
> fix2(all three together).
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/java/org/apache/samza/container/RunLoopFactory.java 
> 609a956a1f2fa97419c2f66fe2fb6876aaaeecd0 
>   samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java 
> 8fac8155c7f64e67d4a39ec6943f98da1e1d63d9 
>   samza-core/src/main/java/org/apache/samza/task/CoordinatorRequests.java 
> 052b3b91ec609ca6288662cfa2d3e71b0273d020 
>   samza-core/src/main/java/org/apache/samza/task/TaskCallbackImpl.java 
> 9b700998d2af040c6734289f7f28bbd78c36bd2c 
>   samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java 
> 132cf59eb593524a4cac134aeceeeb37a4c74b1f 
>   samza-core/src/main/java/org/apache/samza/util/HighResolutionClock.java 
> 69ba441ed087305dfe4e1272b00fad67b644e13f 
>   
> samza-core/src/main/java/org/apache/samza/util/SystemHighResolutionClock.java 
> 2e65b603bc959273e679eba3ea9f89c7c0c4664d 
>   

Re: Review Request 53282: SAMZA-1043: Samza performance improvements

2016-11-08 Thread Xinyu Liu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53282/
---

(Updated Nov. 9, 2016, 2:14 a.m.)


Review request for samza, Chris Pettitt, Jake Maes, and Navina Ramesh.


Changes
---

Updates from Chris's feedback.


Repository: samza


Description
---

In the recent experiments of samza batch job (consuming hdfs data on hadoop), 
the results are subpar to map/reduce and spark. By looking at the metrics 
closely, we found two basic problems:

1) Not enough data to process. This is spotted as the unprocessed message queue 
length was zero for quite a lot of times.

2) Not process fast enough. We found samza performed closely in both median 
size records (100B) and small record (10B), while spark can scale very well in 
the small record (over 1M/s).

The first problem is solved by increasing the buffer size. This ticket is to 
address the second problem, which contains three major improvements:

- Option to turn off timer metrics calculation: one of the main time spent in 
samza processing turns out to be just keeping the timer metrics. While it is 
useful in debugging, it becomes a bottleneck when running a stable job with 
high performance. In my testing job which consumes 8M mock data, it took 30 
secs with timer metrics on. After turning it off, it only took 14 secs.

- Java coding improvements: The AsyncRunLoop code can be further optimized for 
efficiency. Some of the thread-safe data structure I am using is not for 
optimal performance (Collections.synchronizedSet). I switched to use 
CopyOnWriteArraySet, which has far better performance due to more reads and 
small set size.

- Specific handling for in-order processing improvements: AsyncRunLoop handles 
the callbacks regardless of whether it's in-order or out-of-order (max 
concurrency > 1), which incurs quite some cost. By simplying the logic for 
in-order handling, the performance gains.

After all three improvements, my test job with mock input (8M messages) can be 
processed within 8 sec (down from org 30 secs), so it's 1M/s for one cpu core.

For the performance benchmark jobs running in Hadoop, we also see a 4 times 
improvement with all the fixes above. Please take a look at the attached 
spreedsheet (see the numbers with fix(turn off the timing metrics) and fix2(all 
three together).


Diffs (updated)
-

  samza-core/src/main/java/org/apache/samza/container/RunLoopFactory.java 
609a956a1f2fa97419c2f66fe2fb6876aaaeecd0 
  samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java 
8fac8155c7f64e67d4a39ec6943f98da1e1d63d9 
  samza-core/src/main/java/org/apache/samza/task/CoordinatorRequests.java 
052b3b91ec609ca6288662cfa2d3e71b0273d020 
  samza-core/src/main/java/org/apache/samza/task/TaskCallbackImpl.java 
9b700998d2af040c6734289f7f28bbd78c36bd2c 
  samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java 
132cf59eb593524a4cac134aeceeeb37a4c74b1f 
  samza-core/src/main/java/org/apache/samza/util/HighResolutionClock.java 
69ba441ed087305dfe4e1272b00fad67b644e13f 
  samza-core/src/main/java/org/apache/samza/util/SystemHighResolutionClock.java 
2e65b603bc959273e679eba3ea9f89c7c0c4664d 
  samza-core/src/main/java/org/apache/samza/util/ThrottlingExecutor.java 
d1298fc40680e5ad4db7067c9ef02f0266dffc1d 
  samza-core/src/main/java/org/apache/samza/util/TimerClock.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/util/Utils.java 
472e0a59d5aa992b136292c8a3347c311e2cd606 
  samza-core/src/main/scala/org/apache/samza/config/MetricsConfig.scala 
c3fd8bfb2e16a4c5146d34682d04cb1d4e9bbe72 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
e0468ee89c89fd720834461771ebb36475475bcb 
  samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala 
e2aed5b1c2e77a914268963b21809380972037b6 
  samza-core/src/main/scala/org/apache/samza/util/Util.scala 
c4836f202f7eda1d4e71eac94fd48e46207b0316 
  samza-core/src/test/java/org/apache/samza/task/TestAsyncRunLoop.java 
6000ffaf2b8723d48a72e58b571f242a42dc8128 
  samza-core/src/test/java/org/apache/samza/task/TestAsyncStreamAdapter.java 
99e1e18bcfa6bca1e275d8ae030a77ff8d70a4eb 
  samza-core/src/test/java/org/apache/samza/task/TestTaskCallbackImpl.java 
f1dbf35165e6ddfc02e3522887c25d78a4bbfcd7 
  samza-core/src/test/java/org/apache/samza/task/TestTaskCallbackManager.java 
d7110f34a9eae6e9ffc15b4982bfbb180da88b2d 
  samza-core/src/test/java/org/apache/samza/util/TestThrottlingExecutor.java 
0276e6b17bc9ad9413611189b2e9ff2b8793694c 
  
samza-kv/src/main/scala/org/apache/samza/storage/kv/BaseKeyValueStorageEngineFactory.scala
 c975893a42689732c39c39600fecacee843bf9d6 

Diff: https://reviews.apache.org/r/53282/diff/


Testing
---

./gradlew build

Tested in the yarn hadoop cluster with different kinds of jobs.


File Attachments


hdfs performance
  

Re: Review Request 52168: Tasks endpoint to list the complete details of all tasks related to a job

2016-11-08 Thread Shanthoosh Venkataraman


> On Nov. 3, 2016, 6:32 a.m., Prateek Maheshwari wrote:
> > docs/learn/documentation/versioned/rest/resources/tasks.md, line 55
> > 
> >
> > What's the difference b/w containerId and containerName?
> 
> Shanthoosh Venkataraman wrote:
> Container Name is the unique name that identifies a container within a 
> job. Exposing this information is useful when killing containers(performing 
> related admin actions on it.). Currently container name is generated 
> prefixing container id with a string.
> 
> Prateek Maheshwari wrote:
> I don't think adding a separate container name concept is necessary. The 
> containerId is sufficient to uniquely identify the container. The 
> "samza-container" prefix makes sense in the MDC for disambiguation in the log 
> lines, but not sure what it buys us here.

Removed container Name from the response.


> On Nov. 3, 2016, 6:32 a.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala,
> >  lines 99-102
> > 
> >
> > Not required for the other place they're used?
> 
> Shanthoosh Venkataraman wrote:
> Yes.
> 
> Prateek Maheshwari wrote:
> "Yes, it's required" or "Yes, it's not required"? If the latter, why?

Yes, it's required. It was added in the refactoring as well.


> On Nov. 3, 2016, 6:32 a.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, 
> > line 104
> > 
> >
> > Don't think we need a Util method for a string concat.
> 
> Shanthoosh Venkataraman wrote:
> Container name is used in multiple places(SamzaContainer and 
> TasksResource). The way in which the container name is constructed from 
> container id could change in the future, hence this was added just to 
> encapsulate that here.
> 
> Prateek Maheshwari wrote:
> See comment above.

Removed container name. Just exposing containerId.


> On Nov. 3, 2016, 6:32 a.m., Prateek Maheshwari wrote:
> > samza-rest/src/main/java/org/apache/samza/rest/proxy/job/SimpleYarnJobProxy.java,
> >  line 52
> > 
> >
> > config.getConfigJobConfigFactory?

Result of careless name refactoring. Fixed.


> On Nov. 3, 2016, 6:32 a.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala,
> >  line 75
> > 
> >
> > Should take MetricsRegistry, not MetricsRegistryMap. Same everywhere 
> > else, here and in samza-rest.

Making this change in samza-rest in straight forward. However, doing it in 
Samza-core is very hard. MetricsRegistryMap in this constructor is dependent 
upon by the constructor of the following classes ClusterBasedJobCoordinator, 
ContainerProcessManager, ContainerProcessManagerMetrics. Cost associated with 
this change would be touching a lot of classes and their associated tests. We 
should punt this for now and do it later.


> On Nov. 3, 2016, 6:32 a.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala, line 30
> > 
> >
> > Prefer explicit imports

Done.


> On Nov. 3, 2016, 6:32 a.m., Prateek Maheshwari wrote:
> > docs/learn/documentation/versioned/rest/resources/tasks.md, line 122
> > 
> >
> > "within the installation path"

Done.


> On Nov. 3, 2016, 6:32 a.m., Prateek Maheshwari wrote:
> > docs/learn/documentation/versioned/rest/resources/tasks.md, line 116
> > 
> >
> > Don't split class names: "TaskProxyFactory", "TaskProxy".
> > 
> > What's the  for? Did you mean a ?
> > 
> > "Has support to get" -> "Gets"
> > 
> > Remove "abstraction" after SimpleInstallationRecord

Done.


- Shanthoosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52168/#review154675
---


On Nov. 9, 2016, 1:23 a.m., Shanthoosh Venkataraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52168/
> ---
> 
> (Updated Nov. 9, 2016, 1:23 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> This patch contains the following changes
>  * Http get api to list the complete details of all the tasks that 

Re: Review Request 52168: Tasks endpoint to list the complete details of all tasks related to a job

2016-11-08 Thread Shanthoosh Venkataraman


> On Nov. 4, 2016, 8:52 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/rest/resources/tasks.md, line 103
> > 
> >
> > Why do we have preferredHost here? The terminology is really confusing. 
> > Isn't `preferredHost` referring to container's host usually?

Was the result of careless refactoring. Removed.


> On Nov. 4, 2016, 8:52 p.m., Jagadish Venkatraman wrote:
> > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, 
> > line 144
> > 
> >
> > Nuke this method. 
> > 
> > Doesn't add any value apart from string concat. Curious about what 
> > value this adds?

Removed container name from the TasksResource. This is no longer needed. 
Removed.


> On Nov. 4, 2016, 8:52 p.m., Jagadish Venkatraman wrote:
> > samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala,
> >  line 316
> > 
> >
> > Reword docs.
> > 
> > I'm sure there are other ways to read changelog partition mapping apart 
> > from this method.

Done.


> On Nov. 4, 2016, 8:52 p.m., Jagadish Venkatraman wrote:
> > samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala,
> >  line 320
> > 
> >
> > Not convinced we need this utility method here. 
> > 
> > Also, read seems to call `start` on the changeLog manager while `write` 
> > does not. 
> > 
> > Then, is the assumption that read must be called prior to write.?

This method is inlined to its invocation.


> On Nov. 4, 2016, 8:52 p.m., Jagadish Venkatraman wrote:
> > samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala,
> >  line 337
> > 
> >
> > This method does not add value. 
> > 
> > Also, it's weird that `writeChangeLogMapping` is doing a 
> > `Read-modify-write operation` on the manager.

This method is inlined to its invocation.


> On Nov. 4, 2016, 8:52 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/rest/resources/tasks.md, line 23
> > 
> >
> > Can we link this to the JobsResource? That way, it;s obvious to the 
> > reader?

Done.


> On Nov. 4, 2016, 8:52 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/rest/resources/tasks.md, line 22
> > 
> >
> > I don't think the line that `Additional operations will be added later` 
> > is meaningful. You can maybe remove it?

Done.


- Shanthoosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52168/#review154972
---


On Nov. 9, 2016, 1:23 a.m., Shanthoosh Venkataraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52168/
> ---
> 
> (Updated Nov. 9, 2016, 1:23 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> This patch contains the following changes
>  * Http get api to list the complete details of all the tasks that belongs to 
> a job. 
>  * Refactored some methods in coordinator stream, to reuse the existing 
> functionality of getting jobConfig from the coordinator stream.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/rest/resource-directory.md 
> 79746d1e2eb3491e4bd26c3c7cf6c7efd150d8ef 
>   docs/learn/documentation/versioned/rest/resources/tasks.md PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
> df63b97e9d598ecd1840111ba490a723e410d089 
>   samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
> 022b480856483059cb9f837a08f97a718bc04c31 
>   samza-core/src/main/scala/org/apache/samza/util/Util.scala 
> c4836f202f7eda1d4e71eac94fd48e46207b0316 
>   samza-rest/src/main/config/samza-rest.properties 
> 7be0b47d1466d2199ae278247e8d81522fb6a91c 
>   samza-rest/src/main/java/org/apache/samza/rest/model/Partition.java 
> PRE-CREATION 
>   samza-rest/src/main/java/org/apache/samza/rest/model/Task.java PRE-CREATION 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/AbstractJobProxy.java
>  4d8647f3e1e650632e38b47ba5a8a2dac004f545 
>   
> samza-rest/src/main/java/org/apache/samza/rest/proxy/job/JobProxyFactory.java 
> 067711a74e5b0d7277a9c8b2d2517b56e9cfbcca 
>   
> 

Re: Review Request 52168: Tasks endpoint to list the complete details of all tasks related to a job

2016-11-08 Thread Shanthoosh Venkataraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52168/
---

(Updated Nov. 9, 2016, 1:23 a.m.)


Review request for samza.


Repository: samza


Description
---

This patch contains the following changes
 * Http get api to list the complete details of all the tasks that belongs to a 
job. 
 * Refactored some methods in coordinator stream, to reuse the existing 
functionality of getting jobConfig from the coordinator stream.


Diffs (updated)
-

  docs/learn/documentation/versioned/rest/resource-directory.md 
79746d1e2eb3491e4bd26c3c7cf6c7efd150d8ef 
  docs/learn/documentation/versioned/rest/resources/tasks.md PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala 
df63b97e9d598ecd1840111ba490a723e410d089 
  samza-core/src/main/scala/org/apache/samza/job/JobRunner.scala 
022b480856483059cb9f837a08f97a718bc04c31 
  samza-core/src/main/scala/org/apache/samza/util/Util.scala 
c4836f202f7eda1d4e71eac94fd48e46207b0316 
  samza-rest/src/main/config/samza-rest.properties 
7be0b47d1466d2199ae278247e8d81522fb6a91c 
  samza-rest/src/main/java/org/apache/samza/rest/model/Partition.java 
PRE-CREATION 
  samza-rest/src/main/java/org/apache/samza/rest/model/Task.java PRE-CREATION 
  
samza-rest/src/main/java/org/apache/samza/rest/proxy/job/AbstractJobProxy.java 
4d8647f3e1e650632e38b47ba5a8a2dac004f545 
  samza-rest/src/main/java/org/apache/samza/rest/proxy/job/JobProxyFactory.java 
067711a74e5b0d7277a9c8b2d2517b56e9cfbcca 
  
samza-rest/src/main/java/org/apache/samza/rest/proxy/job/SimpleYarnJobProxy.java
 a935c98730f85f448c688a6baf2e8ddffdbb2cb4 
  
samza-rest/src/main/java/org/apache/samza/rest/proxy/job/SimpleYarnJobProxyFactory.java
 11d93d4608d23a4e3fb3bfc50dfac35ab6dbdf3c 
  samza-rest/src/main/java/org/apache/samza/rest/proxy/task/SamzaTaskProxy.java 
PRE-CREATION 
  
samza-rest/src/main/java/org/apache/samza/rest/proxy/task/SamzaTaskProxyFactory.java
 PRE-CREATION 
  samza-rest/src/main/java/org/apache/samza/rest/proxy/task/TaskProxy.java 
PRE-CREATION 
  
samza-rest/src/main/java/org/apache/samza/rest/proxy/task/TaskProxyFactory.java 
PRE-CREATION 
  
samza-rest/src/main/java/org/apache/samza/rest/proxy/task/TaskResourceConfig.java
 PRE-CREATION 
  
samza-rest/src/main/java/org/apache/samza/rest/resources/BaseResourceConfig.java
 PRE-CREATION 
  
samza-rest/src/main/java/org/apache/samza/rest/resources/DefaultResourceFactory.java
 e0224c6bcf4aeaa336e5786ac472482507fcd382 
  samza-rest/src/main/java/org/apache/samza/rest/resources/JobsResource.java 
a566db598c284d69ea61af88fdc0851483d5a089 
  
samza-rest/src/main/java/org/apache/samza/rest/resources/JobsResourceConfig.java
 527482d2ee55747e7b3f9c54c8a3b1afe7ad8797 
  samza-rest/src/main/java/org/apache/samza/rest/resources/Responses.java 
PRE-CREATION 
  samza-rest/src/main/java/org/apache/samza/rest/resources/TasksResource.java 
PRE-CREATION 
  
samza-rest/src/test/java/org/apache/samza/rest/resources/TestJobsResource.java 
7db437b348ecd286185898b8f8ab0220d59da71a 
  
samza-rest/src/test/java/org/apache/samza/rest/resources/TestTasksResource.java 
PRE-CREATION 
  
samza-rest/src/test/java/org/apache/samza/rest/resources/mock/MockInstallationFinder.java
 PRE-CREATION 
  
samza-rest/src/test/java/org/apache/samza/rest/resources/mock/MockResourceFactory.java
 PRE-CREATION 
  
samza-rest/src/test/java/org/apache/samza/rest/resources/mock/MockTaskProxy.java
 PRE-CREATION 
  
samza-rest/src/test/java/org/apache/samza/rest/resources/mock/MockTaskProxyFactory.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/52168/diff/


Testing
---

Manual and unit testing has been done to verify the apis.


Thanks,

Shanthoosh Venkataraman



Re: Review Request 53297: Initial version of adding metrics into samza rest.

2016-11-08 Thread Prateek Maheshwari

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53297/#review155380
---




docs/learn/documentation/versioned/rest/monitors.md (line 100)


This section could be more concise.



samza-core/src/main/java/org/apache/samza/util/MetricsReporterLoader.java (line 
36)


Minor: private constructors for helper classes are pretty universal, don't 
think they need a comment.



samza-core/src/main/java/org/apache/samza/util/MetricsReporterLoader.java (line 
40)


This javadoc doesn't add any more information than the method signature 
already provides. Prefer removing.



samza-core/src/main/java/org/apache/samza/util/MetricsReporterLoader.java (line 
45)


Not sure this specific method (for classloading this particular class from 
configs) deserves a new Util class. This is a general pattern used in other 
places, maybe we should extract that as a util instead.

May be worth revisiting if/how we want to share code b/w samza-rest and 
samza-core. Ideally samza-rest shouldn't depend on samza-core, and shared stuff 
should be pulled out to samza-api or samza-common (which doesn't exist yet).



samza-rest/src/main/java/org/apache/samza/rest/SamzaRestConfig.java (line 55)


What is this config, and what does container refer to here? The samza-rest 
service container?


- Prateek Maheshwari


On Nov. 8, 2016, 3:13 p.m., Shanthoosh Venkataraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53297/
> ---
> 
> (Updated Nov. 8, 2016, 3:13 p.m.)
> 
> 
> Review request for samza and Jake Maes.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> This patch aims at enabling users to define custom reporters to send metrics 
> from the monitors. Configurations required for the definition of the metrics 
> reporters follows the same convention as of the samza jobs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/rest/monitors.md 
> 46678bbe5fed99f767c3324dc9578ee1a64cec66 
>   samza-core/src/main/java/org/apache/samza/util/MetricsReporterLoader.java 
> PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> e0468ee89c89fd720834461771ebb36475475bcb 
>   
> samza-core/src/main/scala/org/apache/samza/metrics/ContainerProcessManagerMetrics.scala
>  f24beb1e099dd44b15b475e0a4a7f70560c6965e 
>   samza-rest/src/main/java/org/apache/samza/rest/SamzaRestConfig.java 
> 47b0663637f6db187d86961377ee3ee203b73fdb 
>   samza-rest/src/main/java/org/apache/samza/rest/SamzaRestService.java 
> 2a3e83a24a5343bb53b93fc9d0a647c1b253714b 
>   samza-rest/src/test/java/org/apache/samza/rest/TestSamzaRestService.java 
> PRE-CREATION 
>   
> samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterMetrics.scala
>  8a5b4aaea6e11a5af999f12d50e5b6135dbc70ca 
> 
> Diff: https://reviews.apache.org/r/53297/diff/
> 
> 
> Testing
> ---
> 
> Unit tests are done to verify the intended functionality.
> 
> 
> Thanks,
> 
> Shanthoosh Venkataraman
> 
>



Re: Review Request 53297: Initial version of adding metrics into samza rest.

2016-11-08 Thread Jagadish Venkatraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53297/#review155372
---




docs/learn/documentation/versioned/rest/monitors.md (line 95)


Documentation looks different (than the rest of the page). Would be nice if 
this was instructional in nature.

/s/create/create and report.
/s/theirs/yours



docs/learn/documentation/versioned/rest/monitors.md (line 99)


1. s/Please refer/You can refer
2. `custom` is probably redundant.
3. The link and anchor text placement is weird. 
4. Prefer to be precise with documentation. (For example, this line has the 
word `metrics reporters` thrice)



docs/learn/documentation/versioned/rest/monitors.md (line 100)


This line is not needed IMO. (If we choose to retain, it, should atleast be 
re-worded)



samza-rest/src/main/java/org/apache/samza/rest/SamzaRestConfig.java (line 55)


How exactly is this used when metrics are reported? 

Is there a reason for this to be configurable / adding a new config key? 

If it's configurable, we should document it somewhere?



samza-rest/src/main/java/org/apache/samza/rest/SamzaRestService.java (line 65)


Why is this taking in a concrete class - ReadableMetricsRegistry ? Prefer 
to have this `MetricsRegistry` instead.

We should operate at the level of interfaces whereever possible.


- Jagadish Venkatraman


On Nov. 8, 2016, 11:13 p.m., Shanthoosh Venkataraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53297/
> ---
> 
> (Updated Nov. 8, 2016, 11:13 p.m.)
> 
> 
> Review request for samza and Jake Maes.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> This patch aims at enabling users to define custom reporters to send metrics 
> from the monitors. Configurations required for the definition of the metrics 
> reporters follows the same convention as of the samza jobs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/rest/monitors.md 
> 46678bbe5fed99f767c3324dc9578ee1a64cec66 
>   samza-core/src/main/java/org/apache/samza/util/MetricsReporterLoader.java 
> PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> e0468ee89c89fd720834461771ebb36475475bcb 
>   
> samza-core/src/main/scala/org/apache/samza/metrics/ContainerProcessManagerMetrics.scala
>  f24beb1e099dd44b15b475e0a4a7f70560c6965e 
>   samza-rest/src/main/java/org/apache/samza/rest/SamzaRestConfig.java 
> 47b0663637f6db187d86961377ee3ee203b73fdb 
>   samza-rest/src/main/java/org/apache/samza/rest/SamzaRestService.java 
> 2a3e83a24a5343bb53b93fc9d0a647c1b253714b 
>   samza-rest/src/test/java/org/apache/samza/rest/TestSamzaRestService.java 
> PRE-CREATION 
>   
> samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterMetrics.scala
>  8a5b4aaea6e11a5af999f12d50e5b6135dbc70ca 
> 
> Diff: https://reviews.apache.org/r/53297/diff/
> 
> 
> Testing
> ---
> 
> Unit tests are done to verify the intended functionality.
> 
> 
> Thanks,
> 
> Shanthoosh Venkataraman
> 
>



Re: Review Request 53453: Add optional interface for SystemConsumer checkpontListener() for checkpoint notifications

2016-11-08 Thread Prateek Maheshwari

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53453/#review155377
---



Some minor code style/documentation related comments. One correctness question.


samza-api/src/main/java/org/apache/samza/checkpoint/CheckpointListener.java 
(line 25)


No comma after SystemConsumers



samza-api/src/main/java/org/apache/samza/system/SystemConsumer.java (line 134)


s/checkpoint/register the same offset.

Also, maintain previous line width for comment.



samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala (line 
77)


Strongly prefer not adding empty Map() as default value here (and maybe 
clean up the other one too). See my comment on Xinyu's HDFS performance RB for 
explanation.



samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala (line 
146)


s/ones/one



samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala (line 
239)


Are you missing a foreach here?

I think you need something like:
lastProcessedOffsets.get(taskName)
  .foreach { sspToOffsets => sspToOffsets.foreach { case (ssp, checkpoint) 
=> offsetManagerMetrics.checkpointedOffsets(ssp).set(checkpoint) } }

If the version above is correct, can we add a test for this? Its easy to 
miss.



samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala (line 
243)


Can remove comment, same information in next line.



samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala (line 
244)


Space after 'case' and after ':'



samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala (line 
245)


s/is an empty list/is empty



samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala (line 
361)


Indent by 2
Space before and after =>
Space b/w map and {



samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala (line 
366)


Indent by 2.


- Prateek Maheshwari


On Nov. 4, 2016, 4:23 p.m., Boris Shkolnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53453/
> ---
> 
> (Updated Nov. 4, 2016, 4:23 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1042
> https://issues.apache.org/jira/browse/SAMZA-1042
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Add optional interface for SystemConsumer checkpontListener() for checkpoint 
> notifications.
> 
> 
> Diffs
> -
> 
>   samza-api/src/main/java/org/apache/samza/checkpoint/CheckpointListener.java 
> PRE-CREATION 
>   samza-api/src/main/java/org/apache/samza/system/SystemConsumer.java 
> 8dfcc7499659442aabd3085a8787475fe38f29db 
>   samza-core/src/main/scala/org/apache/samza/checkpoint/OffsetManager.scala 
> c41eadb70f4675816245f7ac40f0db2fc16335f0 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> e0468ee89c89fd720834461771ebb36475475bcb 
>   
> samza-core/src/test/scala/org/apache/samza/checkpoint/TestOffsetManager.scala 
> cb78223f1b59a78bbeb1e42b5974670a53def504 
> 
> Diff: https://reviews.apache.org/r/53453/diff/
> 
> 
> Testing
> ---
> 
> gradlew test.
> manual testing.
> 
> 
> Thanks,
> 
> Boris Shkolnik
> 
>



Re: Review Request 53282: SAMZA-1043: Samza performance improvements

2016-11-08 Thread Prateek Maheshwari

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53282/#review155376
---




samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala (line 
155)


Maybe 'timer' or 'highResolutionClock' instead, since clock can be confused 
for currentTimeMillis? As Jagadish pointed out in an unrelated context, they're 
not equivalent since one returns epoch time and the other is only useful for 
time differences.



samza-kv/src/main/scala/org/apache/samza/storage/kv/BaseKeyValueStorageEngineFactory.scala
 (line 138)


Only tangentially related to RB, but this feels wrong in terms of config 
(and code) dependencies: StorageEngine is reading the Metrics configuration out 
of the entire container context, so that it can choose what clock the Timer 
should use.

Haven't thought this through, but maybe we should refactor things to move 
the clock to the Timer class instead?


- Prateek Maheshwari


On Nov. 7, 2016, 4:47 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53282/
> ---
> 
> (Updated Nov. 7, 2016, 4:47 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Jake Maes, and Navina Ramesh.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> In the recent experiments of samza batch job (consuming hdfs data on hadoop), 
> the results are subpar to map/reduce and spark. By looking at the metrics 
> closely, we found two basic problems:
> 
> 1) Not enough data to process. This is spotted as the unprocessed message 
> queue length was zero for quite a lot of times.
> 
> 2) Not process fast enough. We found samza performed closely in both median 
> size records (100B) and small record (10B), while spark can scale very well 
> in the small record (over 1M/s).
> 
> The first problem is solved by increasing the buffer size. This ticket is to 
> address the second problem, which contains three major improvements:
> 
> - Option to turn off timer metrics calculation: one of the main time spent in 
> samza processing turns out to be just keeping the timer metrics. While it is 
> useful in debugging, it becomes a bottleneck when running a stable job with 
> high performance. In my testing job which consumes 8M mock data, it took 30 
> secs with timer metrics on. After turning it off, it only took 14 secs.
> 
> - Java coding improvements: The AsyncRunLoop code can be further optimized 
> for efficiency. Some of the thread-safe data structure I am using is not for 
> optimal performance (Collections.synchronizedSet). I switched to use 
> CopyOnWriteArraySet, which has far better performance due to more reads and 
> small set size.
> 
> - Specific handling for in-order processing improvements: AsyncRunLoop 
> handles the callbacks regardless of whether it's in-order or out-of-order 
> (max concurrency > 1), which incurs quite some cost. By simplying the logic 
> for in-order handling, the performance gains.
> 
> After all three improvements, my test job with mock input (8M messages) can 
> be processed within 8 sec (down from org 30 secs), so it's 1M/s for one cpu 
> core.
> 
> For the performance benchmark jobs running in Hadoop, we also see a 4 times 
> improvement with all the fixes above. Please take a look at the attached 
> spreedsheet (see the numbers with fix(turn off the timing metrics) and 
> fix2(all three together).
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/java/org/apache/samza/container/RunLoopFactory.java 
> 609a956a1f2fa97419c2f66fe2fb6876aaaeecd0 
>   samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java 
> 8fac8155c7f64e67d4a39ec6943f98da1e1d63d9 
>   samza-core/src/main/java/org/apache/samza/task/CoordinatorRequests.java 
> 052b3b91ec609ca6288662cfa2d3e71b0273d020 
>   samza-core/src/main/java/org/apache/samza/task/TaskCallbackImpl.java 
> 9b700998d2af040c6734289f7f28bbd78c36bd2c 
>   samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java 
> 132cf59eb593524a4cac134aeceeeb37a4c74b1f 
>   samza-core/src/main/java/org/apache/samza/util/TimerClock.java PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/util/Utils.java 
> 472e0a59d5aa992b136292c8a3347c311e2cd606 
>   samza-core/src/main/scala/org/apache/samza/config/MetricsConfig.scala 
> c3fd8bfb2e16a4c5146d34682d04cb1d4e9bbe72 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> e0468ee89c89fd720834461771ebb36475475bcb 
>   samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala 
> e2aed5b1c2e77a914268963b21809380972037b6 
>   samza-core/src/main/scala/org/apache/samza/util/Util.scala 
> 

Re: Review Request 53282: SAMZA-1043: Samza performance improvements

2016-11-08 Thread Prateek Maheshwari


> On Nov. 2, 2016, 12:57 p.m., Prateek Maheshwari wrote:
> >

Sorry for the late reply, didn't get an email notification for your replies.


> On Nov. 2, 2016, 12:57 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/system/SystemConsumers.scala, 
> > line 65
> > 
> >
> > I don't think we should ever be creating a default new SerdeManager 
> > just for this class. Same for SystemConsumerMetrics (and other places where 
> > this pattern is used for objects). Constants are fine.
> 
> Xinyu Liu wrote:
> I think these are created as default for the testing purpose. Normally we 
> will pass in the real SerdeManager.

I'd still argue that we shouldn't do this.
1. It makes it possible to accidentally forget passing objects which are 
actually necessary for this class to be functional.
2. Looking at the signature of these constructors, there's no indication 
whether the field is really required or optional.
2. We rely on this pattern in other places for automatically creating objects. 
It makes looking up what classes are being used where more difficult.

The only benefit is saving a few characters when instantiating in tests. Would 
strongly prefer always passing objects explicitly.


> On Nov. 2, 2016, 12:57 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/util/Util.scala, line 397
> > 
> >
> > This seems like a weird method to have. Would prefer to remove.
> 
> Xinyu Liu wrote:
> This makes a lot easier to convert a Java TimerClock object to a scala 
> function to return the time. I agree it's not pretty, but there has to be a 
> workaround for converting this right now.

Can we pass TimerClock to Scala classes too? Should be cleaner.


> On Nov. 2, 2016, 12:57 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/java/org/apache/samza/util/TimerClock.java, line 25
> > 
> >
> > Can we just add this method to the existing Clock interface? Weird to 
> > have two clock interfaces.
> 
> Xinyu Liu wrote:
> Chris raised the same question. The other interface has an extra method I 
> don't need, plus I want to use lamdba for this. I think I can make the 
> HighResolutionClock extends from this one. Does that sounds better?

As you mentioned earlier, moving the other method in HighResolutionClock to the 
calling class and use it instead makes sense.


> On Nov. 2, 2016, 12:57 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/config/MetricsConfig.scala, line 
> > 33
> > 
> >
> > Minor: s/timer/timers (same for field name)
> 
> Xinyu Liu wrote:
> I put this config to mean all timer metrics are turned on/off. So I guess 
> this config name should be fine.

Plural seems more appropriate grammatically since this applies to all timers.


- Prateek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53282/#review154602
---


On Nov. 7, 2016, 4:47 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53282/
> ---
> 
> (Updated Nov. 7, 2016, 4:47 p.m.)
> 
> 
> Review request for samza, Chris Pettitt, Jake Maes, and Navina Ramesh.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> In the recent experiments of samza batch job (consuming hdfs data on hadoop), 
> the results are subpar to map/reduce and spark. By looking at the metrics 
> closely, we found two basic problems:
> 
> 1) Not enough data to process. This is spotted as the unprocessed message 
> queue length was zero for quite a lot of times.
> 
> 2) Not process fast enough. We found samza performed closely in both median 
> size records (100B) and small record (10B), while spark can scale very well 
> in the small record (over 1M/s).
> 
> The first problem is solved by increasing the buffer size. This ticket is to 
> address the second problem, which contains three major improvements:
> 
> - Option to turn off timer metrics calculation: one of the main time spent in 
> samza processing turns out to be just keeping the timer metrics. While it is 
> useful in debugging, it becomes a bottleneck when running a stable job with 
> high performance. In my testing job which consumes 8M mock data, it took 30 
> secs with timer metrics on. After turning it off, it only took 14 secs.
> 
> - Java coding improvements: The AsyncRunLoop code can be further optimized 
> for efficiency. Some of the thread-safe data structure I 

Re: Review Request 53297: Initial version of adding metrics into samza rest.

2016-11-08 Thread Shanthoosh Venkataraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53297/
---

(Updated Nov. 8, 2016, 11:13 p.m.)


Review request for samza and Jake Maes.


Repository: samza


Description
---

This patch aims at enabling users to define custom reporters to send metrics 
from the monitors. Configurations required for the definition of the metrics 
reporters follows the same convention as of the samza jobs.


Diffs (updated)
-

  docs/learn/documentation/versioned/rest/monitors.md 
46678bbe5fed99f767c3324dc9578ee1a64cec66 
  samza-core/src/main/java/org/apache/samza/util/MetricsReporterLoader.java 
PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
e0468ee89c89fd720834461771ebb36475475bcb 
  
samza-core/src/main/scala/org/apache/samza/metrics/ContainerProcessManagerMetrics.scala
 f24beb1e099dd44b15b475e0a4a7f70560c6965e 
  samza-rest/src/main/java/org/apache/samza/rest/SamzaRestConfig.java 
47b0663637f6db187d86961377ee3ee203b73fdb 
  samza-rest/src/main/java/org/apache/samza/rest/SamzaRestService.java 
2a3e83a24a5343bb53b93fc9d0a647c1b253714b 
  samza-rest/src/test/java/org/apache/samza/rest/TestSamzaRestService.java 
PRE-CREATION 
  
samza-yarn/src/main/scala/org/apache/samza/job/yarn/SamzaAppMasterMetrics.scala 
8a5b4aaea6e11a5af999f12d50e5b6135dbc70ca 

Diff: https://reviews.apache.org/r/53297/diff/


Testing
---

Unit tests are done to verify the intended functionality.


Thanks,

Shanthoosh Venkataraman



Re: Review Request 53297: Initial version of adding metrics into samza rest.

2016-11-08 Thread Shanthoosh Venkataraman


> On Nov. 5, 2016, 6:47 p.m., Jake Maes wrote:
> > samza-core/src/main/scala/org/apache/samza/util/Util.scala, line 336
> > 
> >
> > This edit looks like a mistake.
> > 
> > Did this file need to be modified at all?

Idea was showing error because of that tag. It's not required for this patch. 
Removed.


> On Nov. 5, 2016, 6:47 p.m., Jake Maes wrote:
> > samza-rest/src/main/java/org/apache/samza/rest/SamzaRestService.java, line 
> > 73
> > 
> >
> > What's the purpose of this static factory?
> > 
> > The typical reason is to construct the object differently (e.g. a 
> > different subclass) based on some parameter. 
> > 
> > I don't see any value of the approach here.
> > 
> > If there is some value, then the constructor should be made private.

Discussed offline. It was harder to test SamzaRestService with all new operator 
instantiations happening in the constructor. Hence, mocking out these 
dependencies were harder. Also with this approach, looking at the constructor 
we get to know the dependencies of SamzaRestService explicitly. However, since 
there are no use cases which would require this factory, removing this factory. 
Keeping the constructor as it is, moving the instantiation into the main method.


> On Nov. 5, 2016, 6:47 p.m., Jake Maes wrote:
> > samza-rest/src/main/java/org/apache/samza/rest/SamzaRestService.java, line 
> > 75
> > 
> >
> > I don't think the MetricsConfig constructure takes a subset. 
> > 
> > I think it takes the root and expects to find the "metrics" prefix
> 
> Jake Maes wrote:
> s/constructure/constructor
> 
> phonetic brain fail

Yes, that's true. It expects the root and finds the metrics prefix. Hence, 
stripPrefix is passed on as false, so that prefix isn't removed. This just 
selects the config subset that starts with METRICS_PREFIX, without removing the 
prefix. The goal is to not pass on the entire config object and pass only 
metrics related config into MetricsConfig constructor.


> On Nov. 5, 2016, 6:47 p.m., Jake Maes wrote:
> > samza-core/src/main/scala/org/apache/samza/util/MetricsReporterLoader.scala,
> >  line 40
> > 
> >
> > We're moving away from Scala. All new files should be Java.

Done. Migrated from scala to java.


> On Nov. 5, 2016, 6:47 p.m., Jake Maes wrote:
> > samza-rest/src/main/java/org/apache/samza/rest/SamzaRestService.java, line 
> > 60
> > 
> >
> > The Resources will eventually emit metrics too, so I think this value 
> > is too specific.

Changed it to SamzaRest. Not sure of the implications of using a proper name 
here. For instance, in SamzaContainerMetrics source string is assigned to value 
"unknown". I'm most certain that this string is a placeholder to register 
MetricsRegistry instances with MetricsReporter and not used when reporting the 
actual metrics.


- Shanthoosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53297/#review155061
---


On Nov. 8, 2016, 11:13 p.m., Shanthoosh Venkataraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53297/
> ---
> 
> (Updated Nov. 8, 2016, 11:13 p.m.)
> 
> 
> Review request for samza and Jake Maes.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> This patch aims at enabling users to define custom reporters to send metrics 
> from the monitors. Configurations required for the definition of the metrics 
> reporters follows the same convention as of the samza jobs.
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/rest/monitors.md 
> 46678bbe5fed99f767c3324dc9578ee1a64cec66 
>   samza-core/src/main/java/org/apache/samza/util/MetricsReporterLoader.java 
> PRE-CREATION 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> e0468ee89c89fd720834461771ebb36475475bcb 
>   
> samza-core/src/main/scala/org/apache/samza/metrics/ContainerProcessManagerMetrics.scala
>  f24beb1e099dd44b15b475e0a4a7f70560c6965e 
>   samza-rest/src/main/java/org/apache/samza/rest/SamzaRestConfig.java 
> 47b0663637f6db187d86961377ee3ee203b73fdb 
>   samza-rest/src/main/java/org/apache/samza/rest/SamzaRestService.java 
> 2a3e83a24a5343bb53b93fc9d0a647c1b253714b 
>   samza-rest/src/test/java/org/apache/samza/rest/TestSamzaRestService.java 
> PRE-CREATION 
>   
> 

Re: Using json serde with KeyValueStore

2016-11-08 Thread Raj raj
Thanks!

It is working as expected.

raj
Thanks and Regards,

Raj


On Sat, Nov 5, 2016 at 11:24 PM, Jagadish Venkatraman
 wrote:
> You can write a small test job and verify this! I believe the serde should
> probably be able to serialize it correctly.
>
> You can define the KeyValueStore to be typed as .
>
> - You can put the LHM directly into the store.
> -  When you do a get on the store, you should probably be able re-cast it
> back to an LHM.
>
> String/json serialization is not necessarily performant for all
> cases. There maybe other efficient serialization formats like protobuffs/
> avro depending on the use case. Should you want to write your own
> serialization have a look at the SerdeFactory interface.
>
> Thanks
> Jag
>
> On Saturday, November 5, 2016, Raj raj  wrote:
>
>> Hi,
>>
>> I am starting out with Samza and my test program is able to read a
>> message from Kafka. My next task is to write the message to KeyValue
>> store. The message is formatted as Json in Kafka and I am using json
>> serde as my msg serde. The incoming message is available to me as a
>> LinkedHashMap when I call envelope.getMessage(). i.e. serde
>> automatically converted the json string to a LinkedHashMap.
>>
>> Now I want to store this to KeyValueStore. Will using a json serde for
>> stores.test-container.msg.serde automatically convert the
>> LinkedHashMap to a format suitable for storing in the KeyValueStore?
>> i.e. the reverse process of what it did when I called
>> envelope.getMessage().
>>
>> If it is not possible what is the best practice of storing a
>> LinkedHashMap in KeyStoreValue?
>>
>> I am pasting the current code I am working with in pastebin for
>> reference, there isn't much in the code, just the boilerplate stuff.
>>
>> test-container.properties : http://pastebin.com/dkbnjtNh
>> TestContainer.java : http://pastebin.com/93ZBgeWa
>>
>> Thanks and Regards,
>>
>> Raj
>>
>
>
> --
> Sent from my iphone.