Review Request 46296: SAMZA-932: JMX port collisions in JmxServer

2016-04-15 Thread Tao Feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/46296/
---

Review request for samza.


Repository: samza


Description
---

SAMZA-932: JMX port collisions in JmxServer


Diffs
-

  samza-core/src/main/scala/org/apache/samza/metrics/JmxServer.scala 
e6204c10878589d34096378e6000709266a9b4a5 

Diff: https://reviews.apache.org/r/46296/diff/


Testing
---

./gradlew clean build && ./gradlew checkstyleMain checkstyleTest


Thanks,

Tao Feng



Re: Review Request 41106: SAMZA-833: ProcessJob mishandling containers

2016-04-14 Thread Tao Feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41106/
---

(Updated April 15, 2016, 3:55 a.m.)


Review request for samza.


Repository: samza


Description (updated)
---

rebase branch


Diffs (updated)
-

  samza-core/src/main/scala/org/apache/samza/job/local/ProcessJobFactory.scala 
17c2e5be9ee216c88e3c07784c4f9c05cd92e9c9 

Diff: https://reviews.apache.org/r/41106/diff/


Testing
---

1. ./gradlew clean build
2. ./gradlew checkstyleMain checkstyleTest
3. manual test open source hello samza and verify that by giving 
yarn.container.count or job.container.count with value greater than 1 for 
ProcessJobFactory, the test will throw the desired exception.


Thanks,

Tao Feng



Review Request 44420: SAMZA-888: SamzaContainer initializes ExponentialSleepStrategy with randInt that is inclusive of 0

2016-03-04 Thread Tao Feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44420/
---

Review request for samza.


Repository: samza


Description
---

SAMZA-888: SamzaContainer initializes ExponentialSleepStrategy with randInt 
that is inclusive of 0


Diffs
-

  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
e3d0b6cb4181e58c1526efecd6fb58cdd69d8fad 

Diff: https://reviews.apache.org/r/44420/diff/


Testing
---

./gradlew clean build && ./gradlew checkstyleMain checkstyleTest


Thanks,

Tao Feng



Re: Review Request 44418: SAMZA-887: Use cached JobModel everywhere in the Samza AM container

2016-03-04 Thread Tao Feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44418/
---

(Updated March 5, 2016, 7:08 a.m.)


Review request for samza.


Summary (updated)
-

SAMZA-887: Use cached JobModel everywhere in the Samza AM container


Repository: samza


Description
---

SAMZA-877: Use cached JobModel everywhere in the Samza AM container


Diffs
-

  samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
0c6329ede9b3df4dc05125729b5b44ba2c98803a 

Diff: https://reviews.apache.org/r/44418/diff/


Testing
---

./gradlew clean build && ./gradlew checkstyleMain checkstyleTest


Thanks,

Tao Feng



Review Request 44418: SAMZA-877: Use cached JobModel everywhere in the Samza AM container

2016-03-04 Thread Tao Feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44418/
---

Review request for samza.


Repository: samza


Description
---

SAMZA-877: Use cached JobModel everywhere in the Samza AM container


Diffs
-

  samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
0c6329ede9b3df4dc05125729b5b44ba2c98803a 

Diff: https://reviews.apache.org/r/44418/diff/


Testing
---

./gradlew clean build && ./gradlew checkstyleMain checkstyleTest


Thanks,

Tao Feng



Re: key value store restore time

2016-02-19 Thread Tao Feng
Hi Leo,

At linkedin when we switched to using RocksDB for Samza last year, we did
some tests to see how well RocksDB performs. We used the rocksdb
microbenchmark(
https://github.com/facebook/rocksdb/blob/master/java/benchmark/src/main/java/org/rocksdb/benchmark/DbBenchmark.java)
to conduct serval tests. For sequential write (10 bytes key, 800 bytes
value, 1,000,000,000 entries), Rocksdb write throughput is around 311 MB
/sec with SSD. You could take a look at the result (
https://issues.apache.org/jira/secure/attachment/12723431/2015-04-06%20RocksDB%20Performance.pdf)
from SAMZA-543 attachment.

When Samza restore data in RocksDB, it is doing RocksDB db put operation
for entry(RocksDbKeyValueStore->putAll). And it takes time to reseed if
your changelog is huge. Hence Samza 0.10 introduce Yarn host-affinity
feature which Jagadish mentions. This should help to solve the long RocksDB
restore time in most cases.

Thanks,
-Tao

On Thu, Feb 18, 2016 at 8:35 AM, Leo Woessner  wrote:

> We are starting to use the key-value store with rocksdb.  We are trying to
> offically add Samza to our stack and functionally everything is great. But,
>
> I am seeing minutes to hours restore time.  Does anyone have any benchmarks
> on data size versus restore time?  My big question is how will this scale.
>
> Thanks in advance
>
> --
> Leo Woessner
>


Re: ChangeLog Question for TTL rocksDB stores

2016-02-05 Thread Tao Feng
Hi David,

My understanding is that Samza changelog could still be logging those
entries deleted by RocksDB TTL. You could refer to SAMZA-677 for more
information.

Thanks,
-Tao

On Wed, Feb 3, 2016 at 5:46 PM, David Garcia 
wrote:

> Boris, thank you for the clarification.  But just to make sure I
> understand, is it correct to say that entries deleted by the TTL-policy in
> rocksDB will NOT be logged in the change-log?  My job processes a lot of
> data and saves a large portion of it to RocksDB (for reference later…but
> subject to a retention policy).  I need to ensure that rocksDB doesn’t
> grow uncontrollably.  If the TTL isn’t reflected in the changelog, then
> it’s quite possible that job restart will push too many messages into
> rocksDB.  Thx again for the help!
>
> -David
>
> On 2/3/16, 7:35 PM, "Boris Shkolnik"  wrote:
>
> >As Jacob mentioned there is not direct relationship between the rocksdb
> >tts
> >(internal to rocksdb) and changelog (done by Samza).
> >The problem may arise if the store is restored from the changelog, since
> >the log will have the expired entries, and they will be entered with the
> >NEW date (and as Yi mentioned, there is no TTL on kafka based changelogs
> >now).
> >But since it is not an error per se, SAMZA-862
> > has changed this
> message
> >to be a warning instead of an error.
> >
> >On Thu, Jan 28, 2016 at 11:51 AM, Yi Pan  wrote:
> >
> >> Hi, David,
> >>
> >> The "compaction" referred to together w/ TTL is referring to RocksDb's
> >> compaction, not the Kafka-based changelog topic. Currently, TTL is not
> >> applied to Kafka-based changelog topic. SAMZA-677 is opened for this.
> >>
> >> -Yi
> >>
> >> On Thu, Jan 28, 2016 at 11:36 AM, David Garcia
> >>  >> > wrote:
> >>
> >> > Ok, that makes sense.  I had assumed that the changelog was supported
> >> > because the docs mention that TTL is enforced upon ³compaction² (I had
> >> > assumed compaction of the DB changelog).  Which topic does the TTL
> >>policy
> >> > listen for the compaction of (since compaction policies of topics can
> >> > differ)?
> >> >
> >> > -David
> >> >
> >> > On 1/27/16, 8:46 PM, "Jacob Maes"  wrote:
> >> >
> >> > >Here's my understanding. The others can correct me if I'm mistaken.
> >> > >
> >> > >Samza provides the changelog functionality by intercepting RocksDB
> >>"put"
> >> > >and "delete" operations. However, TTL is managed by RocksDB
> >>internally
> >> and
> >> > >there aren't any hooks exposed in the RocksDB JNI. So there are 2
> >> problems
> >> > >that arise with TTL and change logging:
> >> > >1. Samza doesn't know when an entry expires, so it can't delete the
> >> > >expired
> >> > >entry from the changelog.
> >> > >2. The changelog currently has no concept of entry age/timestamp, so
> >> when
> >> > >the changelog is restored, it's unknown whether some subset (or all)
> >>of
> >> > >the
> >> > >entries should be immediately expired.
> >> > >
> >> > >These issues aren't insurmountable, but they weren't pursued for the
> >> > >initial implementation. Perhaps because there was a shortage of use
> >> cases
> >> > >that needed both TTL and changelogging, but I'm not sure.
> >> > >
> >> > >-Jake
> >> > >
> >> > >On Wed, Jan 27, 2016 at 6:19 PM, David Garcia
> >> > >
> >> > >wrote:
> >> > >
> >> > >> So, I saw this very scary message:
> >> > >>
> >> > >>
> >> > >> ERROR - e.kv.RocksDbKeyValueStore$ - sessionJoinStore is a TTL
> >>based
> >> > >> store, changelog is not supported for TTL based stores, use at your
> >> own
> >> > >> discretion
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> A few of questions:
> >> > >>
> >> > >> 1.) Does this mean that this store is NOT backed by the changelog?
> >> > >>
> >> > >> 2.) Provided that the store IS backed by a change log, do the TTL
> >> > >> expirations commit removals from the changelog (I.e.
> >> Nulls)...presumably
> >> > >> upon compaction
> >> > >>
> >> > >> 3.) Can I please get a bit more detail on how TTL affects a
> >>changelog
> >> > >> store?
> >> > >>
> >> > >>
> >> > >> -David
> >> > >>
> >> >
> >> >
> >>
>
>


Re: samza gc tuning, what about serial + serial old?

2016-02-01 Thread Tao Feng
Hi Bo,

I think application usually would not want to use Serial GC which is
designed only for uniprocessor. If you have 8G~10G memory, the STW time
with serial GC could be quite large.  Even Samza is designed for one core
as you mentioned, if the message rate of upstream data is huge, you would
still need to have multiple Samza containers to consume the upstream data
in order to avoid message lag(fall behind produce data offset). With full
GC(each ~10s) happened frequently , the QPS could be very minimal which I
would imagine it is hard for this Samza job to keep up the upstream data
message rate.

Thanks,
-Tao

On Sun, Jan 31, 2016 at 7:58 PM, Liu Bo  wrote:

> Hi group
>
> We are trying to migrate our current streaming pipeline to samza. Our
> pipeline has several NLP modules, such as segment, POS, and a lot of score
> calculation. Each process normally needs 8~10GB memory.
>
> Our goal is high throughput so we use Parallel Scavenge + Parallel Old in
> our current setup. We've tried G1 in Java 8 U65, it's not so good for
> throughput.
>
> My question is since samza is designed for one core, dose it means that
> Serial + Serial Old is the best garbage collector for samza? On paper
> serial is more efficient.
>
> If it's not could someone share your experience on samza GC tuning for
> discussion? Thanks in advance.
>
> --
> All the best
>
> Liu Bo
>


Review Request 43011: SAMZA-807: Need a configuration variable to ignore SystemStreamPartitionGrouperFactory changes in checkpoint topic

2016-01-30 Thread Tao Feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43011/
---

Review request for samza.


Repository: samza


Description
---

Add a configuration tunable(job.systemstreampartition.grouper.diff.ignored) to 
ignore SystemStreamPartitionGrouperFactory changes in checkpoint topic with 
default set to false.
Add unit test


Diffs
-

  docs/learn/documentation/versioned/jobs/configuration-table.html 
67055307329c81c9347f1de352420a48a63634b4 
  samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala 
1a8adae4d30fa198c90e8c177c7f17269c5953cd 
  
samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointLogKey.scala
 ea8462dc10d0c98fe99215e6b6b9d59cce4bcffa 
  
samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
 787de1f62479a098bf251f072fca03bbf92f7c6d 
  
samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManagerFactory.scala
 7db894091284794b7f5fac164eb55b5d78184a36 
  
samza-kafka/src/test/scala/org/apache/samza/checkpoint/kafka/TeskKafkaCheckpointLogKey.scala
 c360b6c74ff924311f350df2e49424351a5c8d07 
  
samza-kafka/src/test/scala/org/apache/samza/checkpoint/kafka/TestKafkaCheckpointManager.scala
 af4051b28df5eeaeaee527a24907a8e66441f743 

Diff: https://reviews.apache.org/r/43011/diff/


Testing
---

./gradlew clean build && ./gradlew checkstyleMain checkstyleTest


Thanks,

Tao Feng



Review Request 43010: SAMZA-807: Need a configuration variable to ignore SystemStreamPartitionGrouperFactory changes in checkpoint topic

2016-01-30 Thread Tao Feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/43010/
---

Review request for samza.


Repository: samza


Description
---

1. Add a configuration tunable(job.systemstreampartition.grouper.diff.ignored) 
to ignore SystemStreamPartitionGrouperFactory changes in checkpoint topic with 
default set to false.
2. Add unit test


Diffs
-

  docs/learn/documentation/versioned/jobs/configuration-table.html 
67055307329c81c9347f1de352420a48a63634b4 
  samza-core/src/main/scala/org/apache/samza/config/JobConfig.scala 
1a8adae4d30fa198c90e8c177c7f17269c5953cd 
  
samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointLogKey.scala
 ea8462dc10d0c98fe99215e6b6b9d59cce4bcffa 
  
samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManager.scala
 787de1f62479a098bf251f072fca03bbf92f7c6d 
  
samza-kafka/src/main/scala/org/apache/samza/checkpoint/kafka/KafkaCheckpointManagerFactory.scala
 7db894091284794b7f5fac164eb55b5d78184a36 
  
samza-kafka/src/test/scala/org/apache/samza/checkpoint/kafka/TeskKafkaCheckpointLogKey.scala
 c360b6c74ff924311f350df2e49424351a5c8d07 
  
samza-kafka/src/test/scala/org/apache/samza/checkpoint/kafka/TestKafkaCheckpointManager.scala
 af4051b28df5eeaeaee527a24907a8e66441f743 

Diff: https://reviews.apache.org/r/43010/diff/


Testing
---

./gradlew clean build && ./gradlew checkstyleMain checkstyleTest


Thanks,

Tao Feng



Review Request 42485: SAMZA-857: Missing break in RocksDbOptionsHelper#options()

2016-01-18 Thread Tao Feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/42485/
---

Review request for samza.


Repository: samza


Description
---

SAMZA-857: Missing break in RocksDbOptionsHelper#options()


Diffs
-

  
samza-kv-rocksdb/src/main/java/org/apache/samza/storage/kv/RocksDbOptionsHelper.java
 e474231f48c2d9ba0c9a73291afcc19b52ce8da1 

Diff: https://reviews.apache.org/r/42485/diff/


Testing
---


Thanks,

Tao Feng



Re: Review Request 41365: SAMZA-838: negative rocksdb.ttl.ms is not handled correctly

2015-12-15 Thread Tao Feng


> On Dec. 15, 2015, 7:59 p.m., Navina Ramesh wrote:
> > samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala,
> >  line 54
> > <https://reviews.apache.org/r/41365/diff/1/?file=1163801#file1163801line54>
> >
> > I don't think mentioning RocksDB wiki is necessary. You can change it 
> > to:  
> > "Non-positive TTL for RocksDB implies infinite TTL for the data."

Thanks Navina for the suggestion. I will update the rb.


- Tao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41365/#review110553
-------


On Dec. 14, 2015, 9:04 p.m., Tao Feng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41365/
> ---
> 
> (Updated Dec. 14, 2015, 9:04 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> rocskdb ttl is handled per RocksDB 
> semantics(https://github.com/facebook/rocksdb/wiki/Time-to-Live) which zero 
> or negative ttl is same as infinity ttl.
> 
> 
> Diffs
> -
> 
>   
> samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
>  211fc3be1e168f1f92812406785b39b5a3fd9174 
> 
> Diff: https://reviews.apache.org/r/41365/diff/
> 
> 
> Testing
> ---
> 
> ./gradlew clean build &&  ./gradlew checkstyleMain checkstyleTest
> 
> 
> Thanks,
> 
> Tao Feng
> 
>



Re: Review Request 41365: SAMZA-838: negative rocksdb.ttl.ms is not handled correctly

2015-12-15 Thread Tao Feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41365/
---

(Updated Dec. 15, 2015, 11:58 p.m.)


Review request for samza.


Repository: samza


Description (updated)
---

update per Navina's comment


Diffs (updated)
-

  
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
 211fc3be1e168f1f92812406785b39b5a3fd9174 

Diff: https://reviews.apache.org/r/41365/diff/


Testing
---

./gradlew clean build &&  ./gradlew checkstyleMain checkstyleTest


Thanks,

Tao Feng



Review Request 41365: SAMZA-838: negative rocksdb.ttl.ms is not handled correctly

2015-12-14 Thread Tao Feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41365/
---

Review request for samza.


Repository: samza


Description
---

rocskdb ttl is handled per RocksDB 
semantics(https://github.com/facebook/rocksdb/wiki/Time-to-Live) which zero or 
negative ttl is same as infinity ttl.


Diffs
-

  
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
 211fc3be1e168f1f92812406785b39b5a3fd9174 

Diff: https://reviews.apache.org/r/41365/diff/


Testing
---

./gradlew clean build &&  ./gradlew checkstyleMain checkstyleTest


Thanks,

Tao Feng



Re: Review Request 41106: SAMZA-833: ProcessJob mishandling containers

2015-12-09 Thread Tao Feng


> On Dec. 9, 2015, 6:06 p.m., Jake Maes wrote:
> > samza-core/src/main/scala/org/apache/samza/job/local/ProcessJobFactory.scala,
> >  line 38
> > <https://reviews.apache.org/r/41106/diff/2/?file=1156859#file1156859line38>
> >
> > nit: this message will no longer be valid after we fully deprecate 
> > yarn.container.count. It would be more future-proof to say "Container count 
> > larger than 1..."

Thanks for the suggestion. I will send out an updated RB later today.


- Tao


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41106/#review109571
-------


On Dec. 9, 2015, 6:41 a.m., Tao Feng wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/41106/
> ---
> 
> (Updated Dec. 9, 2015, 6:41 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> 1. throws runtime exception if user sets yarn.container.count or 
> job.container.count larger than 1 for ProcessJobFactory; 
> 2. update per Yi's comment
> 
> 
> Diffs
> -
> 
>   
> samza-core/src/main/scala/org/apache/samza/job/local/ProcessJobFactory.scala 
> 0792a59cb7b042220c8dbfc0713c5ef42e93ab25 
> 
> Diff: https://reviews.apache.org/r/41106/diff/
> 
> 
> Testing
> ---
> 
> 1. ./gradlew clean build
> 2. ./gradlew checkstyleMain checkstyleTest
> 3. manual test open source hello samza and verify that by giving 
> yarn.container.count or job.container.count with value greater than 1 for 
> ProcessJobFactory, the test will throw the desired exception.
> 
> 
> Thanks,
> 
> Tao Feng
> 
>



Re: Review Request 41106: SAMZA-833: ProcessJob mishandling containers

2015-12-09 Thread Tao Feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41106/
---

(Updated Dec. 10, 2015, 4:34 a.m.)


Review request for samza.


Repository: samza


Description (updated)
---

update per Jake's comment


Diffs (updated)
-

  samza-core/src/main/scala/org/apache/samza/job/local/ProcessJobFactory.scala 
0792a59cb7b042220c8dbfc0713c5ef42e93ab25 

Diff: https://reviews.apache.org/r/41106/diff/


Testing
---

1. ./gradlew clean build
2. ./gradlew checkstyleMain checkstyleTest
3. manual test open source hello samza and verify that by giving 
yarn.container.count or job.container.count with value greater than 1 for 
ProcessJobFactory, the test will throw the desired exception.


Thanks,

Tao Feng



Re: [VOTE] Samza 0.10.0 Release Candidate 2

2015-12-09 Thread Tao Feng
+1 from my side(non-binding). I download the package and successfully run
all the unit tests without failure.

On Tue, Dec 8, 2015 at 3:38 PM, Yi Pan  wrote:

> Hey all,
>
>
> This is a call for a vote on a release of Apache Samza 0.10.0. Thanks to
> everyone who has contributed to this release. We are very glad to see some
> new contributors in this release.
>
>
> The release candidate can be downloaded from here:
>
>
> http://home.apache.org/~nickpan47/samza-0.10.0-rc2/
>
>
> The release candidate is signed with pgp key 911402D8, which is
>
> included in the repository's KEYS file:
>
>
>
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob;f=KEYS;h=66cbd15cddbbd798c3529e9a8b7f052aab0037a7
>
>
> and can also be found on keyservers:
>
> http://pgp.mit.edu/pks/lookup?op=get=0x911402D8
>
>
> The git tag is release-0.10.0-rc2 and signed with the same pgp key:
>
>
>
> https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=4a37a3c4754b94805646522fc6644f2dd998e828
>
>
> Test binaries have been published to Maven's staging repository, and are
>
> available here:
>
>
> https://repository.apache.org/content/repositories/orgapachesamza-1011/
>
>
> Note that the binaries were built with JDK7 without incident. This is the
> first version of Samza that does not support JDK6 any more.
>
>
> 128 issues were resolved for this release:
>
>
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.10.0%20AND%20status%20in%20(Resolved%2C%20Closed)
>
>
> The vote will be open for 72 hours ( end in 4:00pm Friday, 12/11/2015 ).
>
> Please download the release candidate, check the hashes/signature, build it
>
> and test it, and then please vote:
>
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
>
> +1 from my side for the release.
>
>
> Yi Pan
>
> nickpa...@gmail.com
>


Re: Review Request 41106: SAMZA-833: ProcessJob mishandling containers

2015-12-08 Thread Tao Feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/41106/
---

(Updated Dec. 9, 2015, 6:41 a.m.)


Review request for samza.


Repository: samza


Description (updated)
---

1. throws runtime exception if user sets yarn.container.count or 
job.container.count larger than 1 for ProcessJobFactory; 
2. update per Yi's comment


Diffs (updated)
-

  samza-core/src/main/scala/org/apache/samza/job/local/ProcessJobFactory.scala 
0792a59cb7b042220c8dbfc0713c5ef42e93ab25 

Diff: https://reviews.apache.org/r/41106/diff/


Testing
---

1. ./gradlew clean build
2. ./gradlew checkstyleMain checkstyleTest
3. manual test open source hello samza and verify that by giving 
yarn.container.count or job.container.count with value greater than 1 for 
ProcessJobFactory, the test will throw the desired exception.


Thanks,

Tao Feng



Re: Review Request 40525: SAMZA-819: RocksDbKeyValueStore.flush() should be implemented

2015-11-20 Thread tao feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40525/
---

(Updated Nov. 20, 2015, 8:20 a.m.)


Review request for samza.


Repository: samza


Description (updated)
---

1. implement RocksDB flush method;
2. RocksDB flushOptions waitForFlush set to true;
3. unit test
 
Diff-2: Update per Yi and Navina's comments for unit test: open a read only 
RocksDB to verify the data


Diffs (updated)
-

  
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStorageEngineFactory.scala
 b949793a63951576937fa848bd674ec68f6f9727 
  
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
 4620037f2a0da43149a754aa808d3d5d280ea893 
  
samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala
 a428a16bc1e9ab4980a6f17db4fd810057d31136 

Diff: https://reviews.apache.org/r/40525/diff/


Testing
---

1. ./gradlew clean build
2. ./gradlew checkstyleMain checkstyleTest


Thanks,

tao feng



Review Request 40525: SAMZA-819: RocksDbKeyValueStore.flush() should be implemented

2015-11-19 Thread tao feng

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40525/
---

Review request for samza.


Repository: samza


Description
---

1. implement RocksDB flush method; 
2. RocksDB flushOptions waitForFlush set to true; 
3. unit test


Diffs
-

  
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStorageEngineFactory.scala
 b949793a63951576937fa848bd674ec68f6f9727 
  
samza-kv-rocksdb/src/main/scala/org/apache/samza/storage/kv/RocksDbKeyValueStore.scala
 4620037f2a0da43149a754aa808d3d5d280ea893 
  
samza-kv-rocksdb/src/test/scala/org/apache/samza/storage/kv/TestRocksDbKeyValueStore.scala
 a428a16bc1e9ab4980a6f17db4fd810057d31136 

Diff: https://reviews.apache.org/r/40525/diff/


Testing
---

1. ./gradlew clean build
2. ./gradlew checkstyleMain checkstyleTest


Thanks,

tao feng



Re: Use one producer for both coordinator stream and users system?

2015-08-18 Thread Tao Feng
Thanks Yan. I guess I am not very clear with the coordinatorStream concept
before.
-Tao

On Tue, Aug 18, 2015 at 12:26 AM, Yan Fang yanfang...@gmail.com wrote:

 Hi Tao,

 First, one kafka producer has an i/o thread. (correct me if I am wrong).

 Second, after Samza 0.10.0, we have a coordinator stream, which stores the
 checkpoint, config and other locality information for auto-scaling, dynamic
 configuration, etc purpose. (See Samza-348
 https://issues.apache.org/jira/browse/SAMZA-348). So we have a producer
 for this coordinator stream.

 Therefore, each contains will have at least two producers, one is for the
 coordinator stream, one is for the users system.

 My question is, can we use only one producer for both coordinator stream
 and the users system to have better performance? (from the doc, it may
 retrieve better performance.)

 Thanks,

 Fang, Yan
 yanfang...@gmail.com

 On Mon, Aug 17, 2015 at 9:49 PM, Tao Feng fengta...@gmail.com wrote:

  Hi Yan,
 
  Naive question: what do we need producer thread of coordinator stream
 for?
 
  Thanks,
  -Tao
 
  On Mon, Aug 17, 2015 at 2:09 PM, Yan Fang yanfang...@gmail.com wrote:
 
   Hi guys,
  
   I have this question because Kafka's doc
   
  
 
 http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html
   
   seems recommending having one producer shared by all threads (*The
   producer is thread safe and should generally be shared among all
 threads
   for best performance.*), while currently the coordinator stream is
  using a
   separate producer (usually, there are two producers(two producer
 threads)
   in each container: one is for the coordinator stream , one is for the
   real job)
  
   1. Will having one producer shared by all thread really improve the
   performance? (haven't done the perf test myself. Guess Kafka has some
   proof).
  
   2. if yes, should we go this way?
  
   Thanks,
  
   Fang, Yan
   yanfang...@gmail.com
  
 



Re: Use one producer for both coordinator stream and users system?

2015-08-17 Thread Tao Feng
Hi Yan,

Naive question: what do we need producer thread of coordinator stream for?

Thanks,
-Tao

On Mon, Aug 17, 2015 at 2:09 PM, Yan Fang yanfang...@gmail.com wrote:

 Hi guys,

 I have this question because Kafka's doc
 
 http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html
 
 seems recommending having one producer shared by all threads (*The
 producer is thread safe and should generally be shared among all threads
 for best performance.*), while currently the coordinator stream is using a
 separate producer (usually, there are two producers(two producer threads)
 in each container: one is for the coordinator stream , one is for the
 real job)

 1. Will having one producer shared by all thread really improve the
 performance? (haven't done the perf test myself. Guess Kafka has some
 proof).

 2. if yes, should we go this way?

 Thanks,

 Fang, Yan
 yanfang...@gmail.com



Re: Measuring Samza Job Throughput

2015-06-18 Thread Tao Feng
Hi, Milinda, Yi,

Sure. I will be happy to help on this.

Thanks,
-Tao

On Wed, Jun 17, 2015 at 11:35 AM, Yi Pan nickpa...@gmail.com wrote:

 Hi, Milinda,

 Tao @LinkedIn has done some Samza benchmark test using a standard
 word-count task. You may want to reach out to him for some detailed ideas
 on how to set up the perf tests.

 Best!

 -Yi

 On Wed, Jun 17, 2015 at 11:25 AM, Milinda Pathirage mpath...@umail.iu.edu
 
 wrote:

  Thank you all for the ideas. I'll have a look at KafkaSystem metrics and
  SamzaContainerMetrics.
 
  Milinda
 
  On Wed, Jun 17, 2015 at 2:38 AM, Tao Feng fengta...@gmail.com wrote:
 
   Hi,
  
   One metric I could think of related to Samza job throughput is the
   process-envelop metric listed in SamzaContainerMetrics. This counter
   get incremented whenever the container process meaningful message(
  
  
 
 https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala
   
  
  
 
 https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala
   ).
  
   But this metric is more like a QPS type of metric .
  
   Thanks,
   -Tao
  
   On Tue, Jun 16, 2015 at 9:11 PM, Milinda Pathirage 
  mpath...@umail.iu.edu
   wrote:
  
Hi Devs,
   
I was looking for a way to measure Samza job throughput and found
 that
   its
possible to do it via Samza's metrics reporter. But there several
 types
   of
metrics reported via this method. For example, TaskInstanceMetrics
   reports
number of messages sent. But if I wanted to get a measurement like
  bytes
per second produced, is there a way to do that. It looks
like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide
   number
of messages sent.
   
If any of you have any experience in measuring Samza job throughput,
  can
you please share. Really appreciate any ideas on measuring job
   throughput.
   
Thanks
Milinda
--
Milinda Pathirage
   
PhD Student | Research Assistant
School of Informatics and Computing | Data to Insight Center
Indiana University
   
twitter: milindalakmal
skype: milinda.pathirage
blog: http://milinda.pathirage.org
   
  
 
 
 
  --
  Milinda Pathirage
 
  PhD Student | Research Assistant
  School of Informatics and Computing | Data to Insight Center
  Indiana University
 
  twitter: milindalakmal
  skype: milinda.pathirage
  blog: http://milinda.pathirage.org
 



Re: Measuring Samza Job Throughput

2015-06-17 Thread Tao Feng
Hi,

One metric I could think of related to Samza job throughput is the
process-envelop metric listed in SamzaContainerMetrics. This counter
get incremented whenever the container process meaningful message(
https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala

https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/container/SamzaContainerMetrics.scala
).

But this metric is more like a QPS type of metric .

Thanks,
-Tao

On Tue, Jun 16, 2015 at 9:11 PM, Milinda Pathirage mpath...@umail.iu.edu
wrote:

 Hi Devs,

 I was looking for a way to measure Samza job throughput and found that its
 possible to do it via Samza's metrics reporter. But there several types of
 metrics reported via this method. For example, TaskInstanceMetrics reports
 number of messages sent. But if I wanted to get a measurement like bytes
 per second produced, is there a way to do that. It looks
 like KafkaSystemProducerMetrics and TaskInstanceMetrics only provide number
 of messages sent.

 If any of you have any experience in measuring Samza job throughput, can
 you please share. Really appreciate any ideas on measuring job throughput.

 Thanks
 Milinda
 --
 Milinda Pathirage

 PhD Student | Research Assistant
 School of Informatics and Computing | Data to Insight Center
 Indiana University

 twitter: milindalakmal
 skype: milinda.pathirage
 blog: http://milinda.pathirage.org