Re: Samza hung after bootstrapping
Hi Roger, I will try to look at the issue tomorrow if my time allows. First thing first: The build has some unexpected results. A quick fix: 1. apply https://issues.apache.org/jira/browse/SAMZA-712 2. add sourceSets.main.scala.srcDir src/main/java sourceSets.main.java.srcDirs = [] at line 126 of build.gradle. Sorry for the inconvenience. Thanks, Fang, Yan yanfang...@gmail.com On Sun, Jun 21, 2015 at 3:55 PM, Roger Hoover roger.hoo...@gmail.com wrote: Was looking through the code a little and it looks like the BootstrappingChooser could use the list of SSPs passed into it's register() method to figure out which partitions it need to monitor. I wanted to try to build Samza to play around with it but I'm getting error trying to build off of both the 0.9.0 and 0.9.1 branches. thedude:samza (0.9.1) $ ./gradlew clean build To honour the JVM settings for this build a new JVM will be forked. Please consider using the daemon: http://gradle.org/docs/2.0/userguide/gradle_daemon.html. :clean :samza-api:clean :samza-core_2.10:clean :samza-kafka_2.10:clean UP-TO-DATE :samza-kv-inmemory_2.10:clean UP-TO-DATE :samza-kv-rocksdb_2.10:clean UP-TO-DATE :samza-kv_2.10:clean UP-TO-DATE :samza-log4j:clean UP-TO-DATE :samza-shell:clean UP-TO-DATE :samza-test_2.10:clean UP-TO-DATE :samza-yarn_2.10:clean UP-TO-DATE :assemble UP-TO-DATE :rat Rat report: build/rat/rat-report.html :check :build :samza-api:compileJava :samza-api:processResources UP-TO-DATE :samza-api:classes :samza-api:jar :samza-api:javadoc /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: warning: no @param for ssp void setStartingOffset(SystemStreamPartition ssp, String offset); ^ /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: warning: no @param for offset void setStartingOffset(SystemStreamPartition ssp, String offset); ^ 2 warnings :samza-api:javadocJar :samza-api:sourcesJar :samza-api:signArchives SKIPPED :samza-api:assemble :samza-api:compileTestJava :samza-api:processTestResources UP-TO-DATE :samza-api:testClasses :samza-api:test :samza-api:check :samza-api:build :samza-core_2.10:compileJava :samza-core_2.10:compileScala [ant:scalac] /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43: error: object SamzaObjectMapper is not a member of package org.apache.samza.serializers.model [ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper [ant:scalac]^ [ant:scalac] /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40: error: object TaskModel is not a member of package org.apache.samza.job.model [ant:scalac] import org.apache.samza.job.model.TaskModel [ant:scalac]^ ... I've got JDK 8 installed. Wondering that makes a difference or not. I'd appreciate any help. Thanks, Roger On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover roger.hoo...@gmail.com wrote: I think I see what's happening. When there are 8 tasks and I set yarn.container.count=8, then each container is responsible for a single task. However, the systemStreamLagCounts map ( https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77 ) and laggingSystemStreamPartitions ( https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83 ) are configured to track all partitions for the bootstrap topic rather than just the one partition assigned to this task. Later in the log, we see that the task/container completed bootstrap for it's own partition. 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser [DEBUG] Bootstrap stream partition is fully caught up: SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0] but the Bootstrapping Chooser still thinks that the remaining partitions (assigned to other tasks in other containers) need to be completed. JMX at this point shows 7 lagging partitions of the 8 original partition count. I'm wondering why no one has run into this. Doesn't LinkedIn use partitioned bootstrapped topics? Thanks, Roger On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover roger.hoo...@gmail.com wrote: Hi Yan, I've uploaded a file with TRACE level logging here: http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz I really appreciate your help as this is a critical issue for me. Thanks, Roger On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang yanfang...@gmail.com wrote: Hi Roger, but it only spawns one container and still hangs after bootstrap -- this probably is due to your local machine does not have enough resource for the second container.
Re: Samza hung after bootstrapping
Hi Yan, I've uploaded a file with TRACE level logging here: http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz I really appreciate your help as this is a critical issue for me. Thanks, Roger On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang yanfang...@gmail.com wrote: Hi Roger, but it only spawns one container and still hangs after bootstrap -- this probably is due to your local machine does not have enough resource for the second container. Because I checked your log file, each container is about 4GB. When I run it on our YARN cluster with a single container, it works correctly. When I tried it with 5 containers, it gets hung after consuming the bootstrap topic. -- Have you figure it out? I have a looked at your log and also the code. My suspect is that, there is a null enveloper somehow blocking the process. If you can paste the trace level log, it will be more helpful because many logs in chooser are trace level. Thanks, Fang, Yan yanfang...@gmail.com On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover roger.hoo...@gmail.com wrote: I need some help. I have a job which bootstraps one stream and then is supposed to read from two. When I run it on our YARN cluster with a single container, it works correctly. When I tried it with 5 containers, it gets hung after consuming the bootstrap topic. I ran it with the grid script on my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one container and still hangs after bootstrap. Debug logs are here: http://pastebin.com/af3KPvju I looked at JMX metrics and see: - Task Metrics - no value for kafka offset of non-bootstrapped stream - SystemConsumerMetrics - choose null keeps incrementing - ssps-needed-by-chooser 1 - unprocessed-messages 62k - Bootstrapping Chooser - lagging partitions 4 - laggin-batch-streams - 4 - batch-resets - 0 Has anyone seen this or can offer ideas of how to better debug it? I'm using Samza 0.9.0 and YARN 2.4.0. Thanks! Roger
Re: Samza hung after bootstrapping
I think I see what's happening. When there are 8 tasks and I set yarn.container.count=8, then each container is responsible for a single task. However, the systemStreamLagCounts map ( https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77) and laggingSystemStreamPartitions ( https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83) are configured to track all partitions for the bootstrap topic rather than just the one partition assigned to this task. Later in the log, we see that the task/container completed bootstrap for it's own partition. 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser [DEBUG] Bootstrap stream partition is fully caught up: SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0] but the Bootstrapping Chooser still thinks that the remaining partitions (assigned to other tasks in other containers) need to be completed. JMX at this point shows 7 lagging partitions of the 8 original partition count. I'm wondering why no one has run into this. Doesn't LinkedIn use partitioned bootstrapped topics? Thanks, Roger On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover roger.hoo...@gmail.com wrote: Hi Yan, I've uploaded a file with TRACE level logging here: http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz I really appreciate your help as this is a critical issue for me. Thanks, Roger On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang yanfang...@gmail.com wrote: Hi Roger, but it only spawns one container and still hangs after bootstrap -- this probably is due to your local machine does not have enough resource for the second container. Because I checked your log file, each container is about 4GB. When I run it on our YARN cluster with a single container, it works correctly. When I tried it with 5 containers, it gets hung after consuming the bootstrap topic. -- Have you figure it out? I have a looked at your log and also the code. My suspect is that, there is a null enveloper somehow blocking the process. If you can paste the trace level log, it will be more helpful because many logs in chooser are trace level. Thanks, Fang, Yan yanfang...@gmail.com On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover roger.hoo...@gmail.com wrote: I need some help. I have a job which bootstraps one stream and then is supposed to read from two. When I run it on our YARN cluster with a single container, it works correctly. When I tried it with 5 containers, it gets hung after consuming the bootstrap topic. I ran it with the grid script on my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one container and still hangs after bootstrap. Debug logs are here: http://pastebin.com/af3KPvju I looked at JMX metrics and see: - Task Metrics - no value for kafka offset of non-bootstrapped stream - SystemConsumerMetrics - choose null keeps incrementing - ssps-needed-by-chooser 1 - unprocessed-messages 62k - Bootstrapping Chooser - lagging partitions 4 - laggin-batch-streams - 4 - batch-resets - 0 Has anyone seen this or can offer ideas of how to better debug it? I'm using Samza 0.9.0 and YARN 2.4.0. Thanks! Roger
Re: [VOTE] Apache Samza 0.9.1 RC0
Hi all, Do you think we could get this bootstrapping bug fixed before 0.9.1 release? It seems like a critical bug. https://issues.apache.org/jira/browse/SAMZA-720 Thanks, Roger On Sat, Jun 20, 2015 at 10:38 PM, Yan Fang yanfang...@gmail.com wrote: Agree. I will test it this weekend. Thanks, Fang, Yan yanfang...@gmail.com On Sat, Jun 20, 2015 at 3:46 PM, Guozhang Wang wangg...@gmail.com wrote: Since we only get one vote so far, I think I have to extend the vote deadline. Let's set it to next Monday 6pm. Please check the candidate and vote for your opinions. Guozhang On Fri, Jun 19, 2015 at 10:03 AM, Yi Pan nickpa...@gmail.com wrote: +1. Ran the Samza failure test suite and succeeded over night. On Wed, Jun 17, 2015 at 5:54 PM, Guozhang Wang wangg...@gmail.com wrote: Hey all, This is a call for a vote on a release of Apache Samza 0.9.1. This is a bug-fix release against 0.9.0. The release candidate can be downloaded from here: http://people.apache.org/~guozhang/samza-0.9.1-rc0/ The release candidate is signed with pgp key 911402D8, which is included in the repository's KEYS file: https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS;hb=6f5bafb6cd93934781161eb6b1868d11ea347c95 and can also be found on keyservers: http://pgp.mit.edu/pks/lookup?op=getsearch=0x911402D8 The git tag is release-0.9.1-rc0 and signed with the same pgp key: https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=e78b9e7f34650538b4bb68b338eb472b98a5709e Test binaries have been published to Maven's staging repository, and are available here: 5 critical bugs were resolved for this release: https://issues.apache.org/jira/browse/SAMZA-715?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.9.1%20AND%20status%20in%20%28Resolved%2C%20Closed%29 The vote will be open for 72 hours ( end in 6:00pm Saturday, 06/20/2015 ). Please download the release candidate, check the hashes/signature, build it and test it, and then please vote: [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove (and reason why) -- Guozhang -- -- Guozhang
Re: Review Request 33419: SAMZA-625: Provide tool to consume changelog and materialize a state store
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33419/ --- (Updated June 21, 2015, 6:10 a.m.) Review request for samza. Changes --- update based on Navina's comment. Bugs: SAMZA-625 https://issues.apache.org/jira/browse/SAMZA-625 Repository: samza Description --- Implemented in Java. * modified build.gradle to have the gradle compile scala first. Because some jave code has dependencies to Scala code * change the state store name by removing the space ( in TaskManager ) * add scala java conversion method in Util because some classes only accept scala map * add java version of some configs * remove duplicated config in samza-log4j * add StorageRevoery class, which does most of the recoverying job. The logic mimics what happens in SamzaContainer. * add StateStorageTool, for the commandline usage * unit tests * docs Diffs (updated) - checkstyle/import-control.xml 3374f0c docs/learn/documentation/versioned/container/state-management.md 79067bb samza-core/src/main/java/org/apache/samza/config/JavaStorageConfig.java PRE-CREATION samza-core/src/main/java/org/apache/samza/config/JavaSystemConfig.java PRE-CREATION samza-core/src/main/java/org/apache/samza/storage/StateStorageTool.java PRE-CREATION samza-core/src/main/java/org/apache/samza/storage/StorageRecovery.java PRE-CREATION samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala aeba61a samza-core/src/main/scala/org/apache/samza/util/Util.scala 2feb65b samza-core/src/test/java/org/apache/samza/config/TestJavaStorageConfig.java PRE-CREATION samza-core/src/test/java/org/apache/samza/config/TestJavaSystemConfig.java PRE-CREATION samza-core/src/test/java/org/apache/samza/storage/MockStorageEngine.java PRE-CREATION samza-core/src/test/java/org/apache/samza/storage/MockStorageEngineFactory.java PRE-CREATION samza-core/src/test/java/org/apache/samza/storage/MockSystemConsumer.java PRE-CREATION samza-core/src/test/java/org/apache/samza/storage/MockSystemFactory.java PRE-CREATION samza-core/src/test/java/org/apache/samza/storage/TestStorageRecovery.java PRE-CREATION samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java d5e24f2 samza-shell/src/main/bash/state-storage-tool.sh PRE-CREATION Diff: https://reviews.apache.org/r/33419/diff/ Testing --- tested with multiple partitions and multiple stores recovery. Thanks, Yan Fang
Re: Review Request 33419: SAMZA-625: Provide tool to consume changelog and materialize a state store
On May 21, 2015, 6:45 p.m., Navina Ramesh wrote: samza-core/src/main/java/org/apache/samza/config/JavaStorageConfig.java, line 28 https://reviews.apache.org/r/33419/diff/3/?file=965787#file965787line28 Is moving away from scala based configs a motivation for all the Java prefixed classes? If the plan is to refactor configs everywhere with Java based configs, then we should probably extend these classes from org.apache.samza.config.MapConfig. It provides some convenient methods because it is backed by a Map. Just curious if you explored that option. This makes sense. I omitted this option. Thanks. Using MapConfig now. Also added unit test to verify that. On May 21, 2015, 6:45 p.m., Navina Ramesh wrote: samza-core/src/main/java/org/apache/samza/config/JavaSystemConfig.java, line 33 https://reviews.apache.org/r/33419/diff/3/?file=965788#file965788line33 Why is Config protected here and private in JavaStorageConfig? fixed in above change. On May 21, 2015, 6:45 p.m., Navina Ramesh wrote: samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala, line 39 https://reviews.apache.org/r/33419/diff/3/?file=965791#file965791line39 why not sanitize the entire file name, instead of just the task name? Personally, I think we should follow a standard convention for sanitizing names, irrespective of whether it is storename or taskname. Just my two cents. aggreed. - Yan --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33419/#review84612 --- On June 21, 2015, 6:10 a.m., Yan Fang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/33419/ --- (Updated June 21, 2015, 6:10 a.m.) Review request for samza. Bugs: SAMZA-625 https://issues.apache.org/jira/browse/SAMZA-625 Repository: samza Description --- Implemented in Java. * modified build.gradle to have the gradle compile scala first. Because some jave code has dependencies to Scala code * change the state store name by removing the space ( in TaskManager ) * add scala java conversion method in Util because some classes only accept scala map * add java version of some configs * remove duplicated config in samza-log4j * add StorageRevoery class, which does most of the recoverying job. The logic mimics what happens in SamzaContainer. * add StateStorageTool, for the commandline usage * unit tests * docs Diffs - checkstyle/import-control.xml 3374f0c docs/learn/documentation/versioned/container/state-management.md 79067bb samza-core/src/main/java/org/apache/samza/config/JavaStorageConfig.java PRE-CREATION samza-core/src/main/java/org/apache/samza/config/JavaSystemConfig.java PRE-CREATION samza-core/src/main/java/org/apache/samza/storage/StateStorageTool.java PRE-CREATION samza-core/src/main/java/org/apache/samza/storage/StorageRecovery.java PRE-CREATION samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala aeba61a samza-core/src/main/scala/org/apache/samza/util/Util.scala 2feb65b samza-core/src/test/java/org/apache/samza/config/TestJavaStorageConfig.java PRE-CREATION samza-core/src/test/java/org/apache/samza/config/TestJavaSystemConfig.java PRE-CREATION samza-core/src/test/java/org/apache/samza/storage/MockStorageEngine.java PRE-CREATION samza-core/src/test/java/org/apache/samza/storage/MockStorageEngineFactory.java PRE-CREATION samza-core/src/test/java/org/apache/samza/storage/MockSystemConsumer.java PRE-CREATION samza-core/src/test/java/org/apache/samza/storage/MockSystemFactory.java PRE-CREATION samza-core/src/test/java/org/apache/samza/storage/TestStorageRecovery.java PRE-CREATION samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java d5e24f2 samza-shell/src/main/bash/state-storage-tool.sh PRE-CREATION Diff: https://reviews.apache.org/r/33419/diff/ Testing --- tested with multiple partitions and multiple stores recovery. Thanks, Yan Fang