Thanks YI,

I opened SAMZA-815 <https://issues.apache.org/jira/browse/SAMZA-815>


Rick


> On Nov 13, 2015, at 2:08 AM, Yi Pan <nickpa...@gmail.com> wrote:
> 
> Hi, Rick,
> 
> Yes, please open a JIRA w/ your configuration, deployment set up and
> sequence, and logs from JobRunner.
> 
> Thanks a lot!
> 
> -Yi
> 
> On Thu, Nov 12, 2015 at 10:10 AM, Rick Mangi <r...@chartbeat.com> wrote:
> 
>> Hi Yi,
>> 
>> I pulled from master and built this morning.
>> 
>> Yes, that’s the output from JobRunner. I also tried setting a job.id to
>> see if this was an issue migrating from an old task checkpoint topic but I
>> got the same result.
>> 
>> Would you like me to open a jira ticket?
>> 
>> Thanks,
>> 
>> Rick
>> 
>> 
>> 
>>> On Nov 12, 2015, at 12:59 PM, Yi Pan <nickpa...@gmail.com> wrote:
>>> 
>>> Hi, Rick,
>>> 
>>> Did you get the fix in SAMZA-723 in your test? And could you confirm that
>>> the errors are from JobRunner log?
>>> 
>>> -Yi
>>> 
>>> On Thu, Nov 12, 2015 at 8:48 AM, Rick Mangi <r...@chartbeat.com> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I’m trying to migrate our samza jobs to 0.10.0 snapshot (built against
>> the
>>>> latest). Everything works fine running locally (although I had to make
>> some
>>>> changes to the local grid’s kafka since the checkpointing seems to
>> require
>>>> replication_factor > 1) but when I deploy it against my production yarn
>>>> cluster I get these errors.
>>>> 
>>>> [yarnmaster01] out: 2015-11-12 10:40:53 ZkClient [INFO] zookeeper state
>>>> changed (SyncConnected)
>>>> [yarnmaster01] out: 2015-11-12 10:40:53 ZkEventThread [INFO] Terminate
>>>> ZkClient event thread.
>>>> [yarnmaster01] out: 2015-11-12 10:40:53 ZooKeeper [INFO] Session:
>>>> 0x250233cdf57f2fa closed
>>>> [yarnmaster01] out: 2015-11-12 10:40:53 ClientCnxn [INFO] EventThread
>> shut
>>>> down
>>>> [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemAdmin [INFO]
>>>> Coordinator stream __samza_coordinator_metrics-reporter_1 already
>> exists.
>>>> [yarnmaster01] out: 2015-11-12 10:40:53 JobRunner [INFO] Storing config
>> in
>>>> coordinator stream.
>>>> [yarnmaster01] out: 2015-11-12 10:40:53 CoordinatorStreamSystemProducer
>>>> [INFO] Starting coordinator stream producer.
>>>> [yarnmaster01] out: 2015-11-12 10:40:53 KafkaSystemProducer [INFO]
>>>> Creating a new producer for system mykafka.
>>>> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [INFO]
>>>> ProducerConfig values:
>>>> [yarnmaster01] out:     value.serializer = class
>>>> org.apache.kafka.common.serialization.ByteArraySerializer
>>>> [yarnmaster01] out:     key.serializer = class
>>>> org.apache.kafka.common.serialization.ByteArraySerializer
>>>> [yarnmaster01] out:     block.on.buffer.full = true
>>>> [yarnmaster01] out:     retry.backoff.ms = 100
>>>> [yarnmaster01] out:     buffer.memory = 33554432
>>>> [yarnmaster01] out:     batch.size = 16384
>>>> [yarnmaster01] out:     metrics.sample.window.ms = 30000
>>>> [yarnmaster01] out:     metadata.max.age.ms = 300000
>>>> [yarnmaster01] out:     receive.buffer.bytes = 32768
>>>> [yarnmaster01] out:     timeout.ms = 30000
>>>> [yarnmaster01] out:     max.in.flight.requests.per.connection = 1
>>>> [yarnmaster01] out:     bootstrap.servers = [
>>>> devstream01.chartbeat.net:9092]
>>>> [yarnmaster01] out:     metric.reporters = []
>>>> [yarnmaster01] out:     client.id =
>>>> samza_producer-metrics_reporter-1-1447342853273-4
>>>> [yarnmaster01] out:     compression.type = none
>>>> [yarnmaster01] out:     retries = 2147483647
>>>> [yarnmaster01] out:     max.request.size = 1048576
>>>> [yarnmaster01] out:     send.buffer.bytes = 131072
>>>> [yarnmaster01] out:     acks = 1
>>>> [yarnmaster01] out:     reconnect.backoff.ms = 10
>>>> [yarnmaster01] out:     linger.ms = 0
>>>> [yarnmaster01] out:     metrics.num.samples = 2
>>>> [yarnmaster01] out:     metadata.fetch.timeout.ms = 60000
>>>> [yarnmaster01] out:
>>>> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The
>>>> configuration batch.num.messages = null was supplied but isn't a known
>>>> config.
>>>> [yarnmaster01] out: 2015-11-12 10:40:53 ProducerConfig [WARN] The
>>>> configuration producer.type = null was supplied but isn't a known
>> config.
>>>> [yarnmaster01] out: Exception in thread "main"
>>>> org.apache.samza.SamzaException:
>>>> org.apache.kafka.common.errors.TimeoutException: Failed to update
>> metadata
>>>> after 60000 ms.
>>>> [yarnmaster01] out:     at
>>>> 
>> org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.send(CoordinatorStreamSystemProducer.java:115)
>>>> [yarnmaster01] out:     at
>>>> 
>> org.apache.samza.coordinator.stream.CoordinatorStreamSystemProducer.writeConfig(CoordinatorStreamSystemProducer.java:132)
>>>> [yarnmaster01] out:     at
>>>> org.apache.samza.job.JobRunner.run(JobRunner.scala:85)
>>>> [yarnmaster01] out:     at
>>>> org.apache.samza.job.JobRunner$.main(JobRunner.scala:43)
>>>> [yarnmaster01] out:     at
>>>> org.apache.samza.job.JobRunner.main(JobRunner.scala)
>>>> [yarnmaster01] out: Caused by:
>>>> org.apache.kafka.common.errors.TimeoutException: Failed to update
>> metadata
>>>> after 60000 ms.
>>>> [yarnmaster01] out:
>>>> 
>>>> 
>>>> Warning: run() received nonzero return code 1 while executing
>>>> './bin/run-job.sh
>>>> 
>> -config-factory=org.apache.samza.config.factories.PropertiesConfigFactory
>>>> --config-path=file://$PWD/conf/metrics_reporter.properties'!
>>>> 
>>>> 
>>>> This looks similar to https://issues.apache.org/jira/browse/SAMZA-560
>> but
>>>> I’m not using a StreamAppender in log4j.
>>>> 
>>>> Any ideas? My first thought is that I might have to delete the existing
>>>> checkpoint topics but that would mean we can’t migrate completely until
>> the
>>>> 10.0 release unless we want to run snapshot code in production.
>>>> 
>>>> Thanks!
>>>> 
>>>> Rick
>>>> 
>>>> 
>>>> 
>> 
>> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to