Re: Samza hung after bootstrapping

2015-06-21 Thread Yan Fang
Hi Roger,

I will try to look at the issue tomorrow if my time allows.

First thing first:

The build has some unexpected results. A quick fix:

1. apply https://issues.apache.org/jira/browse/SAMZA-712
2. add

sourceSets.main.scala.srcDir src/main/java sourceSets.main.java.srcDirs =
[]

at line 126 of build.gradle.

Sorry for the inconvenience.

Thanks,

Fang, Yan
yanfang...@gmail.com

On Sun, Jun 21, 2015 at 3:55 PM, Roger Hoover roger.hoo...@gmail.com
wrote:

 Was looking through the code a little and it looks like the
 BootstrappingChooser could use the list of SSPs passed into it's register()
 method to figure out which partitions it need to monitor.

 I wanted to try to build Samza to play around with it but I'm getting error
 trying to build off of both the 0.9.0 and 0.9.1 branches.

 thedude:samza (0.9.1) $ ./gradlew clean build

 To honour the JVM settings for this build a new JVM will be forked. Please
 consider using the daemon:
 http://gradle.org/docs/2.0/userguide/gradle_daemon.html.

 :clean

 :samza-api:clean

 :samza-core_2.10:clean

 :samza-kafka_2.10:clean UP-TO-DATE

 :samza-kv-inmemory_2.10:clean UP-TO-DATE

 :samza-kv-rocksdb_2.10:clean UP-TO-DATE

 :samza-kv_2.10:clean UP-TO-DATE

 :samza-log4j:clean UP-TO-DATE

 :samza-shell:clean UP-TO-DATE

 :samza-test_2.10:clean UP-TO-DATE

 :samza-yarn_2.10:clean UP-TO-DATE

 :assemble UP-TO-DATE

 :rat

 Rat report: build/rat/rat-report.html

 :check

 :build

 :samza-api:compileJava

 :samza-api:processResources UP-TO-DATE

 :samza-api:classes

 :samza-api:jar

 :samza-api:javadoc


 /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
 warning: no @param for ssp

   void setStartingOffset(SystemStreamPartition ssp, String offset);

^


 /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
 warning: no @param for offset

   void setStartingOffset(SystemStreamPartition ssp, String offset);

^

 2 warnings

 :samza-api:javadocJar

 :samza-api:sourcesJar

 :samza-api:signArchives SKIPPED

 :samza-api:assemble

 :samza-api:compileTestJava

 :samza-api:processTestResources UP-TO-DATE

 :samza-api:testClasses

 :samza-api:test

 :samza-api:check

 :samza-api:build

 :samza-core_2.10:compileJava

 :samza-core_2.10:compileScala

 [ant:scalac]

 /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43:
 error: object SamzaObjectMapper is not a member of package
 org.apache.samza.serializers.model

 [ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper

 [ant:scalac]^

 [ant:scalac]

 /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40:
 error: object TaskModel is not a member of package
 org.apache.samza.job.model

 [ant:scalac] import org.apache.samza.job.model.TaskModel

 [ant:scalac]^

 ...


 I've got JDK 8 installed.  Wondering that makes a difference or not.  I'd
 appreciate any help.

 Thanks,

 Roger



 On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover roger.hoo...@gmail.com
 wrote:

  I think I see what's happening.
 
  When there are 8 tasks and I set yarn.container.count=8, then each
  container is responsible for a single task.  However, the
  systemStreamLagCounts map (
 
 https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77
 )
  and laggingSystemStreamPartitions (
 
 https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83
 )
  are configured to track all partitions for the bootstrap topic rather
 than
  just the one partition assigned to this task.
 
  Later in the log, we see that the task/container completed bootstrap for
  it's own partition.
 
  2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser
  [DEBUG] Bootstrap stream partition is fully caught up:
  SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0]
 
  but the Bootstrapping Chooser still thinks that the remaining partitions
  (assigned to other tasks in other containers) need to be completed.  JMX
 at
  this point shows 7 lagging partitions of the 8 original partition count.
 
  I'm wondering why no one has run into this.  Doesn't LinkedIn use
  partitioned bootstrapped topics?
 
  Thanks,
 
  Roger
 
  On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover roger.hoo...@gmail.com
  wrote:
 
  Hi Yan,
 
  I've uploaded a file with TRACE level logging here:
  http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz
 
  I really appreciate your help as this is a critical issue for me.
 
  Thanks,
 
  Roger
 
  On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang yanfang...@gmail.com
 wrote:
 
  Hi Roger,
 
   but it only spawns one container and still hangs after bootstrap
  -- this probably is due to your local machine does not have enough
  resource for the second container. 

Re: Samza hung after bootstrapping

2015-06-21 Thread Roger Hoover
Hi Yan,

I've uploaded a file with TRACE level logging here:
http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz

I really appreciate your help as this is a critical issue for me.

Thanks,

Roger

On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang yanfang...@gmail.com wrote:

 Hi Roger,

  but it only spawns one container and still hangs after bootstrap
 -- this probably is due to your local machine does not have enough
 resource for the second container. Because I checked your log file, each
 container is about 4GB.

 When I run it on our YARN cluster with a single container, it works
 correctly.  When I tried it with 5 containers, it gets hung after consuming
 the bootstrap topic.
-- Have you figure it out? I have a looked at your log and also the
 code. My suspect is that, there is a null enveloper somehow blocking the
 process. If you can paste the trace level log, it will be more helpful
 because many logs in chooser are trace level.

 Thanks,

 Fang, Yan
 yanfang...@gmail.com

 On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover roger.hoo...@gmail.com
 wrote:

  I need some help.  I have a job which bootstraps one stream and then is
  supposed to read from two.  When I run it on our YARN cluster with a
 single
  container, it works correctly.  When I tried it with 5 containers, it
 gets
  hung after consuming the bootstrap topic.  I ran it with the grid script
 on
  my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one
  container and still hangs after bootstrap.
 
  Debug logs are here: http://pastebin.com/af3KPvju
 
  I looked at JMX metrics and see:
  - Task Metrics - no value for kafka offset of non-bootstrapped stream
  -  SystemConsumerMetrics
  - choose null keeps incrementing
   - ssps-needed-by-chooser 1
- unprocessed-messages 62k
  - Bootstrapping Chooser
- lagging partitions 4
- laggin-batch-streams - 4
- batch-resets - 0
 
  Has anyone seen this or can offer ideas of how to better debug it?
 
  I'm using Samza 0.9.0 and YARN 2.4.0.
 
  Thanks!
 
  Roger
 



Re: Samza hung after bootstrapping

2015-06-21 Thread Roger Hoover
I think I see what's happening.

When there are 8 tasks and I set yarn.container.count=8, then each
container is responsible for a single task.  However, the
systemStreamLagCounts map (
https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77)
and laggingSystemStreamPartitions (
https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83)
are configured to track all partitions for the bootstrap topic rather than
just the one partition assigned to this task.

Later in the log, we see that the task/container completed bootstrap for
it's own partition.

2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser
[DEBUG] Bootstrap stream partition is fully caught up:
SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0]

but the Bootstrapping Chooser still thinks that the remaining partitions
(assigned to other tasks in other containers) need to be completed.  JMX at
this point shows 7 lagging partitions of the 8 original partition count.

I'm wondering why no one has run into this.  Doesn't LinkedIn use
partitioned bootstrapped topics?

Thanks,

Roger

On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover roger.hoo...@gmail.com
wrote:

 Hi Yan,

 I've uploaded a file with TRACE level logging here:
 http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz

 I really appreciate your help as this is a critical issue for me.

 Thanks,

 Roger

 On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang yanfang...@gmail.com wrote:

 Hi Roger,

  but it only spawns one container and still hangs after bootstrap
 -- this probably is due to your local machine does not have enough
 resource for the second container. Because I checked your log file, each
 container is about 4GB.

 When I run it on our YARN cluster with a single container, it works
 correctly.  When I tried it with 5 containers, it gets hung after
 consuming
 the bootstrap topic.
-- Have you figure it out? I have a looked at your log and also the
 code. My suspect is that, there is a null enveloper somehow blocking the
 process. If you can paste the trace level log, it will be more helpful
 because many logs in chooser are trace level.

 Thanks,

 Fang, Yan
 yanfang...@gmail.com

 On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover roger.hoo...@gmail.com
 wrote:

  I need some help.  I have a job which bootstraps one stream and then is
  supposed to read from two.  When I run it on our YARN cluster with a
 single
  container, it works correctly.  When I tried it with 5 containers, it
 gets
  hung after consuming the bootstrap topic.  I ran it with the grid
 script on
  my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one
  container and still hangs after bootstrap.
 
  Debug logs are here: http://pastebin.com/af3KPvju
 
  I looked at JMX metrics and see:
  - Task Metrics - no value for kafka offset of non-bootstrapped stream
  -  SystemConsumerMetrics
  - choose null keeps incrementing
   - ssps-needed-by-chooser 1
- unprocessed-messages 62k
  - Bootstrapping Chooser
- lagging partitions 4
- laggin-batch-streams - 4
- batch-resets - 0
 
  Has anyone seen this or can offer ideas of how to better debug it?
 
  I'm using Samza 0.9.0 and YARN 2.4.0.
 
  Thanks!
 
  Roger
 





Re: [VOTE] Apache Samza 0.9.1 RC0

2015-06-21 Thread Roger Hoover
Hi all,

Do you think we could get this bootstrapping bug fixed before 0.9.1
release?  It seems like a critical bug.

https://issues.apache.org/jira/browse/SAMZA-720

Thanks,

Roger

On Sat, Jun 20, 2015 at 10:38 PM, Yan Fang yanfang...@gmail.com wrote:

 Agree. I will test it this weekend.

 Thanks,

 Fang, Yan
 yanfang...@gmail.com

 On Sat, Jun 20, 2015 at 3:46 PM, Guozhang Wang wangg...@gmail.com wrote:

  Since we only get one vote so far, I think I have to extend the vote
  deadline. Let's set it to next Monday 6pm.
 
  Please check the candidate and vote for your opinions.
 
  Guozhang
 
  On Fri, Jun 19, 2015 at 10:03 AM, Yi Pan nickpa...@gmail.com wrote:
 
   +1. Ran the Samza failure test suite and succeeded over night.
  
   On Wed, Jun 17, 2015 at 5:54 PM, Guozhang Wang wangg...@gmail.com
  wrote:
  
Hey all,
   
This is a call for a vote on a release of Apache Samza 0.9.1. This
 is a
bug-fix release against 0.9.0.
   
The release candidate can be downloaded from here:
   
http://people.apache.org/~guozhang/samza-0.9.1-rc0/
   
The release candidate is signed with pgp key 911402D8, which is
included in the repository's KEYS file:
   
   
   
  
 
 https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS;hb=6f5bafb6cd93934781161eb6b1868d11ea347c95
   
and can also be found on keyservers:
   
http://pgp.mit.edu/pks/lookup?op=getsearch=0x911402D8
   
The git tag is release-0.9.1-rc0 and signed with the same pgp key:
   
   
   
  
 
 https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=e78b9e7f34650538b4bb68b338eb472b98a5709e
   
Test binaries have been published to Maven's staging repository, and
  are
available here:
   
5 critical bugs were resolved for this release:
   
   
   
  
 
 https://issues.apache.org/jira/browse/SAMZA-715?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20%28Resolved%2C%20Closed%29
   
The vote will be open for 72 hours ( end in 6:00pm Saturday,
 06/20/2015
   ).
Please download the release candidate, check the hashes/signature,
  build
   it
and test it, and then please vote:
   
[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)
   
-- Guozhang
   
  
 
 
 
  --
  -- Guozhang
 



Re: Review Request 33419: SAMZA-625: Provide tool to consume changelog and materialize a state store

2015-06-21 Thread Yan Fang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33419/
---

(Updated June 21, 2015, 6:10 a.m.)


Review request for samza.


Changes
---

update based on Navina's comment.


Bugs: SAMZA-625
https://issues.apache.org/jira/browse/SAMZA-625


Repository: samza


Description
---

Implemented in Java.

* modified build.gradle to have the gradle compile scala first. Because some 
jave code has dependencies to Scala code
* change the state store name by removing the space ( in TaskManager )
* add scala java conversion method in Util because some classes only accept 
scala map
* add java version of some configs 
* remove duplicated config in samza-log4j
* add StorageRevoery class, which does most of the recoverying job. The logic 
mimics what happens in SamzaContainer.
* add StateStorageTool, for the commandline usage
* unit tests
* docs


Diffs (updated)
-

  checkstyle/import-control.xml 3374f0c 
  docs/learn/documentation/versioned/container/state-management.md 79067bb 
  samza-core/src/main/java/org/apache/samza/config/JavaStorageConfig.java 
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/config/JavaSystemConfig.java 
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/storage/StateStorageTool.java 
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/storage/StorageRecovery.java 
PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
aeba61a 
  samza-core/src/main/scala/org/apache/samza/util/Util.scala 2feb65b 
  samza-core/src/test/java/org/apache/samza/config/TestJavaStorageConfig.java 
PRE-CREATION 
  samza-core/src/test/java/org/apache/samza/config/TestJavaSystemConfig.java 
PRE-CREATION 
  samza-core/src/test/java/org/apache/samza/storage/MockStorageEngine.java 
PRE-CREATION 
  
samza-core/src/test/java/org/apache/samza/storage/MockStorageEngineFactory.java 
PRE-CREATION 
  samza-core/src/test/java/org/apache/samza/storage/MockSystemConsumer.java 
PRE-CREATION 
  samza-core/src/test/java/org/apache/samza/storage/MockSystemFactory.java 
PRE-CREATION 
  samza-core/src/test/java/org/apache/samza/storage/TestStorageRecovery.java 
PRE-CREATION 
  samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java 
d5e24f2 
  samza-shell/src/main/bash/state-storage-tool.sh PRE-CREATION 

Diff: https://reviews.apache.org/r/33419/diff/


Testing
---

tested with multiple partitions and multiple stores recovery.


Thanks,

Yan Fang



Re: Review Request 33419: SAMZA-625: Provide tool to consume changelog and materialize a state store

2015-06-21 Thread Yan Fang


 On May 21, 2015, 6:45 p.m., Navina Ramesh wrote:
  samza-core/src/main/java/org/apache/samza/config/JavaStorageConfig.java, 
  line 28
  https://reviews.apache.org/r/33419/diff/3/?file=965787#file965787line28
 
  Is moving away from scala based configs a motivation for all the 
  Java prefixed classes?
  
  If the plan is to refactor configs everywhere with Java based configs, 
  then we should probably extend these classes from 
  org.apache.samza.config.MapConfig. It provides some convenient methods 
  because it is backed by a Map. Just curious if you explored that option.

This makes sense. I omitted this option. Thanks. Using MapConfig now. Also 
added unit test to verify that.


 On May 21, 2015, 6:45 p.m., Navina Ramesh wrote:
  samza-core/src/main/java/org/apache/samza/config/JavaSystemConfig.java, 
  line 33
  https://reviews.apache.org/r/33419/diff/3/?file=965788#file965788line33
 
  Why is Config protected here and private in JavaStorageConfig?

fixed in above change.


 On May 21, 2015, 6:45 p.m., Navina Ramesh wrote:
  samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala,
   line 39
  https://reviews.apache.org/r/33419/diff/3/?file=965791#file965791line39
 
  why not sanitize the entire file name, instead of just the task name?
  Personally, I think we should follow a standard convention for 
  sanitizing names, irrespective of whether it is storename or taskname. Just 
  my two cents.

aggreed.


- Yan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33419/#review84612
---


On June 21, 2015, 6:10 a.m., Yan Fang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/33419/
 ---
 
 (Updated June 21, 2015, 6:10 a.m.)
 
 
 Review request for samza.
 
 
 Bugs: SAMZA-625
 https://issues.apache.org/jira/browse/SAMZA-625
 
 
 Repository: samza
 
 
 Description
 ---
 
 Implemented in Java.
 
 * modified build.gradle to have the gradle compile scala first. Because some 
 jave code has dependencies to Scala code
 * change the state store name by removing the space ( in TaskManager )
 * add scala java conversion method in Util because some classes only accept 
 scala map
 * add java version of some configs 
 * remove duplicated config in samza-log4j
 * add StorageRevoery class, which does most of the recoverying job. The logic 
 mimics what happens in SamzaContainer.
 * add StateStorageTool, for the commandline usage
 * unit tests
 * docs
 
 
 Diffs
 -
 
   checkstyle/import-control.xml 3374f0c 
   docs/learn/documentation/versioned/container/state-management.md 79067bb 
   samza-core/src/main/java/org/apache/samza/config/JavaStorageConfig.java 
 PRE-CREATION 
   samza-core/src/main/java/org/apache/samza/config/JavaSystemConfig.java 
 PRE-CREATION 
   samza-core/src/main/java/org/apache/samza/storage/StateStorageTool.java 
 PRE-CREATION 
   samza-core/src/main/java/org/apache/samza/storage/StorageRecovery.java 
 PRE-CREATION 
   samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
 aeba61a 
   samza-core/src/main/scala/org/apache/samza/util/Util.scala 2feb65b 
   samza-core/src/test/java/org/apache/samza/config/TestJavaStorageConfig.java 
 PRE-CREATION 
   samza-core/src/test/java/org/apache/samza/config/TestJavaSystemConfig.java 
 PRE-CREATION 
   samza-core/src/test/java/org/apache/samza/storage/MockStorageEngine.java 
 PRE-CREATION 
   
 samza-core/src/test/java/org/apache/samza/storage/MockStorageEngineFactory.java
  PRE-CREATION 
   samza-core/src/test/java/org/apache/samza/storage/MockSystemConsumer.java 
 PRE-CREATION 
   samza-core/src/test/java/org/apache/samza/storage/MockSystemFactory.java 
 PRE-CREATION 
   samza-core/src/test/java/org/apache/samza/storage/TestStorageRecovery.java 
 PRE-CREATION 
   samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java 
 d5e24f2 
   samza-shell/src/main/bash/state-storage-tool.sh PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/33419/diff/
 
 
 Testing
 ---
 
 tested with multiple partitions and multiple stores recovery.
 
 
 Thanks,
 
 Yan Fang