Thanks YI, I opened SAMZA-815 <https://issues.apache.org/jira/browse/SAMZA-815>
Rick > On Nov 13, 2015, at 2:08 AM, Yi Pan <nickpa...@gmail.com> wrote: > > Hi, Rick, > > Yes, please open a JIRA w/ your configuration, deployment set up and > sequence, and logs from JobRunner. > > Thanks a lot! > > -Yi > > On Thu, Nov 12, 2015 at 10:10 AM, Rick Mangi <r...@chartbeat.com> wrote: > >> Hi Yi, >> >> I pulled from master and built this morning. >> >> Yes, that’s the output from JobRunner. I also tried setting a job.id to >> see if this was an issue migrating from an old task checkpoint topic but I >> got the same result. >> >> Would you like me to open a jira ticket? >> >> Thanks, >> >> Rick >> >> >> >>> On Nov 12, 2015, at 12:59 PM, Yi Pan <nickpa...@gmail.com> wrote: >>> >>> Hi, Rick, >>> >>> Did you get the fix in SAMZA-723 in your test? And could you confirm that >>> the errors are from JobRunner log? >>> >>> -Yi >>> >>> On Thu, Nov 12, 2015 at 8:48 AM, Rick Mangi <r...@chartbeat.com> wrote: >>> >>>> Hi, >>>> >>>> I’m trying to migrate our samza jobs to 0.10.0 snapshot (built against >> the >>>> latest). Everything works fine running locally (although I had to make >> some >>>> changes to the local grid’s kafka since the checkpointing seems to >> require >>>> replication_factor > 1) but when I deploy it against my production yarn >>>> cluster I get these errors. >>>> >>>> [yarnmaster01] out: 2015-11-12 10:40:53 ZkClient [INFO] zookeeper state >>>> changed (SyncConnected) >>>> [yarnmaster01] out: 2015-11-12 10:40:53 ZkEventThread [INFO] Terminate >>>> ZkClient event thread. >>>> [yarnmaster01] out: 2015-11-12 10:40:53 ZooKeeper [INFO] Session: >>>> 0x250233cdf57f2fa closed >>>> [yarnmaster01] out: 2015-11-12 10:40:53 ClientCnxn [INFO] EventThread >> shut >>>> down >>>> [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemAdmin [INFO] >>>> Coordinator stream __samza_coordinator_metrics-reporter_1 already >> exists. >>>> [yarnmaster01] out: 2015-11-12 10:40:53 JobRunner [INFO] Storing config >> in >>>> coordinator stream. >>>> [yarnmaster01] out: 2015-11-12 10:40:53 CoordinatorStreamSystemProducer >>>> [INFO] Starting coordinator stream producer. >>>> [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemProducer [INFO] >>>> Creating a new producer for system mykafka. >>>> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [INFO] >>>> ProducerConfig values: >>>> [yarnmaster01] out: value.serializer = class >>>> org.apache.kafka.common.serialization.ByteArraySerializer >>>> [yarnmaster01] out: key.serializer = class >>>> org.apache.kafka.common.serialization.ByteArraySerializer >>>> [yarnmaster01] out: block.on.buffer.full = true >>>> [yarnmaster01] out: retry.backoff.ms = 100 >>>> [yarnmaster01] out: buffer.memory = 33554432 >>>> [yarnmaster01] out: batch.size = 16384 >>>> [yarnmaster01] out: metrics.sample.window.ms = 30000 >>>> [yarnmaster01] out: metadata.max.age.ms = 300000 >>>> [yarnmaster01] out: receive.buffer.bytes = 32768 >>>> [yarnmaster01] out: timeout.ms = 30000 >>>> [yarnmaster01] out: max.in.flight.requests.per.connection = 1 >>>> [yarnmaster01] out: bootstrap.servers = [ >>>> devstream01.chartbeat.net:9092] >>>> [yarnmaster01] out: metric.reporters = [] >>>> [yarnmaster01] out: client.id = >>>> samza_producer-metrics_reporter-1-1447342853273-4 >>>> [yarnmaster01] out: compression.type = none >>>> [yarnmaster01] out: retries = 2147483647 >>>> [yarnmaster01] out: max.request.size = 1048576 >>>> [yarnmaster01] out: send.buffer.bytes = 131072 >>>> [yarnmaster01] out: acks = 1 >>>> [yarnmaster01] out: reconnect.backoff.ms = 10 >>>> [yarnmaster01] out: linger.ms = 0 >>>> [yarnmaster01] out: metrics.num.samples = 2 >>>> [yarnmaster01] out: metadata.fetch.timeout.ms = 60000 >>>> [yarnmaster01] out: >>>> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The >>>> configuration batch.num.messages = null was supplied but isn't a known >>>> config. >>>> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The >>>> configuration producer.type = null was supplied but isn't a known >> config. >>>> [yarnmaster01] out: Exception in thread "main" >>>> org.apache.samza.SamzaException: >>>> org.apache.kafka.common.errors.TimeoutException: Failed to update >> metadata >>>> after 60000 ms. >>>> [yarnmaster01] out: at >>>> >> org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(CoordinatorStreamSystemProducer.java:115) >>>> [yarnmaster01] out: at >>>> >> org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeConfig(CoordinatorStreamSystemProducer.java:132) >>>> [yarnmaster01] out: at >>>> org.apache.samza.job.JobRunner.run(JobRunner.scala:85) >>>> [yarnmaster01] out: at >>>> org.apache.samza.job.JobRunner$.main(JobRunner.scala:43) >>>> [yarnmaster01] out: at >>>> org.apache.samza.job.JobRunner.main(JobRunner.scala) >>>> [yarnmaster01] out: Caused by: >>>> org.apache.kafka.common.errors.TimeoutException: Failed to update >> metadata >>>> after 60000 ms. >>>> [yarnmaster01] out: >>>> >>>> >>>> Warning: run() received nonzero return code 1 while executing >>>> './bin/run-job.sh >>>> >> -config-factory=org.apache.samza.config.factories.PropertiesConfigFactory >>>> --config-path=file://$PWD/conf/metrics_reporter.properties'! >>>> >>>> >>>> This looks similar to https://issues.apache.org/jira/browse/SAMZA-560 >> but >>>> I’m not using a StreamAppender in log4j. >>>> >>>> Any ideas? My first thought is that I might have to delete the existing >>>> checkpoint topics but that would mean we can’t migrate completely until >> the >>>> 10.0 release unless we want to run snapshot code in production. >>>> >>>> Thanks! >>>> >>>> Rick >>>> >>>> >>>> >> >>
signature.asc
Description: Message signed with OpenPGP using GPGMail