Re: java.lang.OutOfMemoryError
In storm, OOM exception is typically caused by incorrectly accumulating references to the objects that should have been freed. This bug can be pinpointed and solved by analyzing the memory dump file. In particular, when OOM happens on a particular worker, a dump file will generated (in the same folder as storm binary, e.g., storm/bin/) which is an instantaneous memory snapshot. You can use dump analytics tools such as VisualVM to see if there are any classes with unexpected large number of instances. If yes, it is highly likely that you forgot to remove the references to the instances after usage. Hope this helps. Li Wang Sent from my iPhone > On 24 Jul 2017, at 08:49, sam mohel wrote: > > I got this error after submitted the topology > java.lang.OutOfMemoryError: Java heap space at > cern.colt.matrix.impl.DenseDoubleMatrix1D.(Unknown Source) at > trident.state.BucketsDB.updateRandomVectors(BucketsDB.java:126) at > trident.state.q > > this error appeared after 8 hours from getting results > How can i solve it ?
Re: Connection refused
Make sure the nimbus is running properly. Sent from my iPhone > On 18 Jul 2017, at 09:16, sam mohel wrote: > > i'm facing problem with submitting topology in distributed mode > storm-0.10.2 zookeeper-3.4.6 > > Exception in thread "main" java.lang.RuntimeException: > org.apache.thrift7.transport.TTransportException: > java.net.ConnectException: Connection refused (Connection refused) > at > backtype.storm.security.auth.TBackoffConnect.retryNext(TBackoffConnect.java:59) > at > backtype.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:51) > at > backtype.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:103) > at backtype.storm.security.auth.ThriftClient.(ThriftClient.java:72) > at backtype.storm.utils.DRPCClient.(DRPCClient.java:44) > > what should i check for fixing this problem ?
Re: Storm Installation issue
May I have look at your storm.yaml file? Sent from my iPhone > On 19 Jun 2017, at 23:48, Ritesh Singh wrote: > > Hi, > When I m not modifying the yml file then nimbus is starting fine, but when I > m modifying yml file then giving error. plz have a look for your reference > (attach file) > Also, plz send me the details, if I m missing. > Also, how will start multiple superwiser in same VM. > how will we test the failover. > Please send me the simple example. > Also send me the Kafka-storm integration step. > I have followed the basic step mentioned in > http://storm.apache.org/releases/1.0.3/Setting-up-a-Storm-cluster.html > > How to setup: > https://storm.apache.org/releases/1.0.0/nimbus-ha-design.html > And How to test? > -- > Thanks & Regards > Ritesh K Singh > System Architect > Cell-9223529532 >
Re: About client session timed out, have not
I merged to track the source of error. I found the first error message is on nimbus.log: "o.a.s.d.nimbus [INFO] Executor T1-1-1497424747:[8 8] not alive". The nimbus detects some executors are not alive and thus make reassignment which cause the worker restart. I did not find any error in the supervisor and worker logs that would causes an executor to be timeout. Since my topology is data-intensive, is it possible that the heartbeat messages was not delivered in time and thus caused the timeout of executors in nimbus? How does the an executor send heartbeat to the nimbus? Is it through zookeeper, or just transmit via netty like a metric tuple? I will first try to solve the problem by increasing timeout value and will let you guys know if it works. Regards, Li Wang On 14 June 2017 at 11:27, Li Wang wrote: > Hi all, > > We deployed a data-intesive topology which involves in a lot of HDFS > access via HDFS client. We found that after the topology has been executed > for about half an hour, the topology throughput occasionally drops to zero > for tens of seconds and sometimes the worker is shutdown without any error > messages. > > I checked the log thoroughly, found nothing wrong but a info message that > reads “ClientCnxn [INFO] Client session timed out, have not head from > server in 1ms for sessioned …”. I am not sure how this message is > related to the wired behavior of my topology. But every time my topology > behaves abnormally, this message happens to show up in the log. > > Any help or suggestion is highly appreciated. > > Thanks, > Li Wang. > > >
About client session timed out, have not
Hi all, We deployed a data-intesive topology which involves in a lot of HDFS access via HDFS client. We found that after the topology has been executed for about half an hour, the topology throughput occasionally drops to zero for tens of seconds and sometimes the worker is shutdown without any error messages. I checked the log thoroughly, found nothing wrong but a info message that reads “ClientCnxn [INFO] Client session timed out, have not head from server in 1ms for sessioned …”. I am not sure how this message is related to the wired behavior of my topology. But every time my topology behaves abnormally, this message happens to show up in the log. Any help or suggestion is highly appreciated. Thanks, Li Wang.
Re: Integration between python and java in storm
Storm has a shell bolt to process tuples in python. This might be what you need. Sent from my iPhone > On 17 Dec 2016, at 7:35 AM, sam mohel wrote: > > I have a project using storm written in JAVA . i want to exchange the > algorithm that the project used it to another BUT the other algorithm > written in Python . > > Can i do this exchange ? or Should i re-write whole project with python ? > > Any Help will be appreciating .. Thanks
Re: Spout failures too high
Hi Pradeep, Your config looks fine to me. Could you please check the log and see what causes the failure? Regards, Li Sent from my iPhone > On 10 Dec 2016, at 11:26 AM, pradeep s wrote: > > Hi, > We are running a 5 node cluster(2x large ec2) with below config. Topology > is consuming message from SQS and writing to RDS and S3. Even if there are > no bolt failures , we are seeing many spout failures. > Can you please help in checking the config.Also i am setting tasks as > parallelism count * 2. Is this fine? > > TOPOLOGY_NAME=MDP_STORM_PRD > MARIA_BOLT_PARALLELISM=50 > S3_BOLT_PARALLELISM=50 > SQS_DELETE_BOLT_PARALLELISM=100 > SPOUT_PARALLELISM=50 > NUMBER_OF_WORKERS=5 > NUMBER_OF_ACKERS=5 > SPOUT_MAX_PENDING=5000 > MESSAGE_TIMEOUT_SECONDS=240 > > > Topology Code > = > > Config config = new Config(); > config.setNumWorkers(numWorkers); > config.setDebug(false); > config.setNumAckers(numAckers); > config.setMaxSpoutPending(maxSpoutPending); > config.setMessageTimeoutSecs(messageTimeoutSecs); > > > topologyBuilder.setSpout(spoutId, new > SQSMessageReaderSpout(sqsUtils.getSQSUrl(dataQueue), properties), >spoutParallelism).*setNumTasks(spoutParallelism * TWO);* > topologyBuilder.setBolt(mariaBoltId, new MariaDbBolt(properties), > mariaBoltParallelism) >.*setNumTasks(mariaBoltParallelism * TWO)*.fieldsGrouping(spoutId, > new Fields(MESSAGE_ID)); > > topologyBuilder.setBolt(s3BoltId, new S3WriteBolt(properties, > s3Properties), s3BoltParallelism) >.*setNumTasks(s3BoltParallelism * TWO).* > shuffleGrouping(mariaBoltId); > > topologyBuilder >.setBolt(sqsDeleteBoltId, new > SQSMessageDeleteBolt(sqsUtils.getSQSUrl(dataQueue)), sqsBoltParallelism) >.*setNumTasks(sqsBoltParallelism * TWO)*.shuffleGrouping(s3BoltId); > > StormSubmitter.submitTopology(topologyName, config, > topologyBuilder.createTopology()); > > > Regards > > Pradeep S
How does the control flow in a Trident Topology work?
Hi guys, I am trying to understand the implementation of Trident. Through reading the code in TridentTopolgyBuilder.java, I understand that some Coordinator components, such as MasterBatchCoordinator and TridentSpoutCoordinator, are added to a user defined topology in TridentTopologyBuilder.createTopology(). I try to understand the control flow of those coordinators, but is seems to be very difficult to get the sense just from source code. Is there any document giving a high level of the control flow of the coordinator components in a Trident Topology? Any help is highly appreciated. Thanks! Sincerely, Li Wang
Re: Topology submision Bugs
Hi WA, You must specify the hostname on the nimbus node with IP address that can be accessed on the supervisor nodes rather than 127.0.0.1. For instance, we assume the host name of nimbus node is nimbus_node and its ip address is 192.168.1.4. Then your /etc/hosts files on both the nimbus node and the supervisor nodes should contain: nimbus_node 192.168.1.4 If the problem still cannot be solved, please attach your /etc/hosts on your nimbus node and any supervisor node. Thanks, Li On 5 July 2016 at 17:05, Walid Aljoby wrote: > Thank you. > I used IP address directly but the problem still the same. > Regards, > WA > > From: Gmail > To: dev@storm.apache.org > Cc: User > Sent: Tuesday, July 5, 2016 7:05 AM > Subject: Re: Topology submision Bugs > > Hi > > Please make sure the host name wf-ubuntun is properly configured in > /etc/hosts on the nodes. > > Hope this helps. > > Li > > Sent from my iPhone > > > On 5 Jul 2016, at 5:27 AM, Walid Aljoby > wrote: > > > > Hi all, > > I am running Apache Storm 1.0.1 over clusters of two nodes (one is > called Master which runs zookeeper and nimbus, the other one runs > supervisor).I just configured the storm.zookeeper.servers: and > nimbus.seeds: to be the IP address of Master node for both two machines. > > However, when I submitted the topology through the Master node like > this: storm jar target/storm-starter-1.0.1.jar > org.apache.storm.starter.WordCountTopology DemoI came across into the > following Exceptions: > > 790 [main] INFO o.a.s.StormSubmitter - Generated ZooKeeper secret > payload for MD5-digest: -6025945650128968937:-8258756050733197469 > > 866 [main] INFO o.a.s.s.a.AuthUtils - Got AutoCreds [] > > Exception in thread "main" java.lang.RuntimeException: > org.apache.storm.thrift.transport.TTransportException: > java.net.UnknownHostException: wf-ubuntun > >at > org.apache.storm.security.auth.TBackoffConnect.retryNext(TBackoffConnect.java:64) > >at > org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:56) > >at > org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:99) > >at > org.apache.storm.security.auth.ThriftClient.(ThriftClient.java:69) > >at org.apache.storm.utils.NimbusClient.(NimbusClient.java:106) > >at > org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:78) > >at > org.apache.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:228) > >at > org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:288) > >at > org.apache.storm.StormSubmitter.submitTopologyWithProgressBar(StormSubmitter.java:324) > >at > org.apache.storm.StormSubmitter.submitTopologyWithProgressBar(StormSubmitter.java:305) > >at > org.apache.storm.starter.WordCountTopology.main(WordCountTopology.java:93) > > Caused by: org.apache.storm.thrift.transport.TTransportException: > java.net.UnknownHostException: wf-ubuntun > >at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:226) > >at > org.apache.storm.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) > >at > org.apache.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:103) > >at > org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:53) > >... 9 more > > Caused by: java.net.UnknownHostException: wf-ubuntun > >at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) > >at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > >at java.net.Socket.connect(Socket.java:579) > >at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:221) > >... 12 more > > > > > > > > > > Please I hope anyone can help, and thank you in advance.. > > > > Best Regards--- > > WA > > >
Re: Different topologies need different schedulers
Hi, Under current scheduler implementation, I am afraid this can not be achieved. Would you please specify the reason why you want different topologies to be scheduled by different schedulers. Thanks. Li > On 14 Mar 2016, at 10:51 AM, devopts wrote: > > Hi all, >I think it is a issue that all topologies only is scheduled by the same > scheduler. > And it doesn't work when i set the "storm.scheduler" value in my > topology,such as > config.put(Config.STORM_SCHEDULE,xxx.xxx). > Now ,different topologies needs to be scheduled by different schedulers. > And how to solve the problem?
[jira] [Created] (STORM-1057) Add throughput metric to spout/bolt and display them on web ui
Li Wang created STORM-1057: -- Summary: Add throughput metric to spout/bolt and display them on web ui Key: STORM-1057 URL: https://issues.apache.org/jira/browse/STORM-1057 Project: Apache Storm Issue Type: New Feature Reporter: Li Wang Assignee: Li Wang Throughput is a fundamental metric to reasoning about the performance bottleneck of a topology. Displaying the throughputs of components and tasks on the web ui could greatly facilitate the user identifying the performance bottleneck and checking whether the the workload among components and tasks are balanced. What to do: 1. Measure the throughput of each spout/bolt. 2. Display the throughput metrics on web UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (STORM-1007) Add more metrics to DisruptorQueue
[ https://issues.apache.org/jira/browse/STORM-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804850#comment-14804850 ] Li Wang edited comment on STORM-1007 at 9/18/15 2:08 AM: - Thanks for your reviewing and merging the code. It's my pleasure to contribute to Storm. was (Author: wangli1426): Thanks for your reviewing the merging the code. It's my pleasure to contribute to Storm. > Add more metrics to DisruptorQueue > -- > > Key: STORM-1007 > URL: https://issues.apache.org/jira/browse/STORM-1007 > Project: Apache Storm > Issue Type: New Feature >Reporter: Li Wang >Assignee: Li Wang > Fix For: 0.11.0 > > Original Estimate: 672h > Remaining Estimate: 672h > > The metrics of the queues for each component are very helpful to reason about > the performance issues of a topology. > For instance, for the applications with strong time constraint (e.g., threat > detection), if the elements in the input queue of a bolt have a long sojourn > time, it indicates that the user should increase the parallelism of that bolt > to reduce the processing delay. > However, the metrics on the DisruptorQueue currently available are limited. > More useful metrics, such as average sojourn time and average throughput are > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-1007) Add more metrics to DisruptorQueue
[ https://issues.apache.org/jira/browse/STORM-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804850#comment-14804850 ] Li Wang commented on STORM-1007: Thanks for your reviewing the merging the code. It's my pleasure to contribute to Storm. > Add more metrics to DisruptorQueue > -- > > Key: STORM-1007 > URL: https://issues.apache.org/jira/browse/STORM-1007 > Project: Apache Storm > Issue Type: New Feature >Reporter: Li Wang >Assignee: Li Wang > Fix For: 0.11.0 > > Original Estimate: 672h > Remaining Estimate: 672h > > The metrics of the queues for each component are very helpful to reason about > the performance issues of a topology. > For instance, for the applications with strong time constraint (e.g., threat > detection), if the elements in the input queue of a bolt have a long sojourn > time, it indicates that the user should increase the parallelism of that bolt > to reduce the processing delay. > However, the metrics on the DisruptorQueue currently available are limited. > More useful metrics, such as average sojourn time and average throughput are > expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (STORM-1008) Isolate the code for metric collection and retrieval from the DisruptorQueue
Li Wang created STORM-1008: -- Summary: Isolate the code for metric collection and retrieval from the DisruptorQueue Key: STORM-1008 URL: https://issues.apache.org/jira/browse/STORM-1008 Project: Apache Storm Issue Type: Sub-task Reporter: Li Wang Assignee: Li Wang Fix For: 0.11.0 The code for collecting and retrieving the metrics of the queue is mixed with DisruptorQueue. It is better to isolate metrics collection and retrieval from the implementation of DisruptorQueue, for better extendability and readability of code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (STORM-1007) Add more metrics to DisruptorQueue
Li Wang created STORM-1007: -- Summary: Add more metrics to DisruptorQueue Key: STORM-1007 URL: https://issues.apache.org/jira/browse/STORM-1007 Project: Apache Storm Issue Type: Bug Reporter: Li Wang Assignee: Li Wang The metrics of the queues for each component are very helpful to reason about the performance issues of a topology. For instance, for the applications with strong time constraint (e.g., threat detection), if the elements in the input queue of a bolt have a long sojourn time, it indicates that the user should increase the parallelism of that bolt to reduce the processing delay. However, the metrics on the DisruptorQueue currently available are limited. More useful metrics, such as average sojourn time and average throughput are expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (STORM-992) A bug in the timer.clj might cause unexpected delay to schedule new event
[ https://issues.apache.org/jira/browse/STORM-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708393#comment-14708393 ] Li Wang commented on STORM-992: --- This bug has been fixed at https://github.com/apache/storm/pull/680, waiting for being merged. I believe this fix could prevent potential bugs from happening. So please merge it. > A bug in the timer.clj might cause unexpected delay to schedule new event > - > > Key: STORM-992 > URL: https://issues.apache.org/jira/browse/STORM-992 > Project: Apache Storm > Issue Type: Bug > Reporter: Li Wang >Assignee: Li Wang > Original Estimate: 10m > Remaining Estimate: 10m > > The timer thread calculates the delay to schedule the head of the queue and > sleeps accordingly. However, if a new event with high priority is inserted at > the head of the queue during the sleep, this event cannot be scheduled in > time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (STORM-992) A bug in the timer.clj might cause unexpected delay to schedule new event
[ https://issues.apache.org/jira/browse/STORM-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Wang resolved STORM-992. --- Resolution: Fixed This bug has been fixed at https://github.com/apache/storm/pull/680 Please consider to merging it. Thanks. > A bug in the timer.clj might cause unexpected delay to schedule new event > - > > Key: STORM-992 > URL: https://issues.apache.org/jira/browse/STORM-992 > Project: Apache Storm > Issue Type: Bug > Reporter: Li Wang > Assignee: Li Wang > Original Estimate: 10m > Remaining Estimate: 10m > > The timer thread calculates the delay to schedule the head of the queue and > sleeps accordingly. However, if a new event with high priority is inserted at > the head of the queue during the sleep, this event cannot be scheduled in > time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (STORM-992) A bug in the timer.clj might cause unexpected delay to schedule new event
Li Wang created STORM-992: - Summary: A bug in the timer.clj might cause unexpected delay to schedule new event Key: STORM-992 URL: https://issues.apache.org/jira/browse/STORM-992 Project: Apache Storm Issue Type: Bug Reporter: Li Wang Assignee: Li Wang The timer thread calculates the delay to schedule the head of the queue and sleeps accordingly. However, if a new event with high priority is inserted at the head of the queue during the sleep, this event cannot be scheduled in time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)