Re: java.lang.OutOfMemoryError

2017-07-23 Thread Li Wang
In storm, OOM exception is typically caused by incorrectly accumulating 
references to the objects that should have been freed. This bug can be 
pinpointed and solved by analyzing the memory dump file. In particular, when 
OOM happens on a particular worker, a dump file will generated (in the same 
folder as storm binary, e.g., storm/bin/) which is an instantaneous memory 
snapshot. You can use dump analytics tools such as VisualVM to see if there are 
any classes with unexpected large number of instances. If yes, it is highly 
likely that you forgot to remove the references to the instances after usage. 

Hope this helps. 

Li Wang

Sent from my iPhone

> On 24 Jul 2017, at 08:49, sam mohel  wrote:
> 
> I got this error after submitted the topology
> java.lang.OutOfMemoryError: Java heap space at
> cern.colt.matrix.impl.DenseDoubleMatrix1D.(Unknown Source) at
> trident.state.BucketsDB.updateRandomVectors(BucketsDB.java:126) at
> trident.state.q
> 
> this error appeared after 8 hours from getting results
> How can i solve it  ?


Re: Connection refused

2017-07-17 Thread Li Wang
Make sure the nimbus is running properly. 

Sent from my iPhone

> On 18 Jul 2017, at 09:16, sam mohel  wrote:
> 
> i'm facing problem with submitting topology in distributed mode
> storm-0.10.2  zookeeper-3.4.6
> 
> Exception in thread "main" java.lang.RuntimeException:
> org.apache.thrift7.transport.TTransportException:
> java.net.ConnectException: Connection refused (Connection refused)
> at
> backtype.storm.security.auth.TBackoffConnect.retryNext(TBackoffConnect.java:59)
> at
> backtype.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:51)
> at
> backtype.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:103)
> at backtype.storm.security.auth.ThriftClient.(ThriftClient.java:72)
> at backtype.storm.utils.DRPCClient.(DRPCClient.java:44)
> 
> what should i check for fixing this problem ?


Re: Storm Installation issue

2017-06-19 Thread Li Wang
May I have look at your storm.yaml file?

Sent from my iPhone

> On 19 Jun 2017, at 23:48, Ritesh Singh  wrote:
> 
> Hi, 
> When I m not modifying the yml file then nimbus is starting fine, but when I 
> m modifying yml file then giving error. plz have a look for your reference 
> (attach file)
> Also, plz send me the details, if I m missing. 
> Also, how will start multiple superwiser in same VM.
> how will we test the failover.
> Please send me the simple example.
> Also send me the Kafka-storm integration step.
> I have followed the basic step mentioned in 
> http://storm.apache.org/releases/1.0.3/Setting-up-a-Storm-cluster.html
> 
> How to setup:
> https://storm.apache.org/releases/1.0.0/nimbus-ha-design.html
> And How to test?
> -- 
> Thanks & Regards
> Ritesh K Singh
> System Architect
> Cell-9223529532
> 


Re: About client session timed out, have not

2017-06-14 Thread Li Wang
I merged to track the source of error. I found the first error message is
on nimbus.log:  "o.a.s.d.nimbus [INFO] Executor T1-1-1497424747:[8 8] not
alive". The nimbus detects some executors are not alive and thus make
reassignment which cause the worker restart.

I did not find any error in the supervisor and worker logs that would
causes an executor to be timeout. Since my topology is data-intensive, is
it possible that the heartbeat messages was not delivered in time and thus
caused the timeout of executors in nimbus? How does the an executor send
heartbeat to the nimbus? Is it through zookeeper, or just transmit via
netty like a metric tuple?

I will first try to solve the problem by increasing timeout value and will
let you guys know if it works.

Regards,
Li Wang

On 14 June 2017 at 11:27, Li Wang  wrote:

> Hi all,
>
> We deployed a data-intesive topology which involves in a lot of HDFS
> access via HDFS client. We found that after the topology has been executed
> for about half an hour, the topology throughput occasionally drops to zero
> for tens of seconds and sometimes the worker is shutdown without any error
> messages.
>
> I checked the log thoroughly, found nothing wrong but a info message  that
> reads “ClientCnxn [INFO] Client session timed out, have not head from
> server in 1ms for sessioned …”. I am not sure how this message is
> related to the wired behavior of my topology. But every time my topology
> behaves abnormally, this message happens to show up in the log.
>
> Any help or suggestion is highly appreciated.
>
> Thanks,
> Li Wang.
>
>
>


About client session timed out, have not

2017-06-13 Thread Li Wang
Hi all,

We deployed a data-intesive topology which involves in a lot of HDFS access via 
HDFS client. We found that after the topology has been executed for about half 
an hour, the topology throughput occasionally drops to zero for tens of seconds 
and sometimes the worker is shutdown without any error messages. 

I checked the log thoroughly, found nothing wrong but a info message  that 
reads “ClientCnxn [INFO] Client session timed out, have not head from server in 
1ms for sessioned …”. I am not sure how this message is related to the 
wired behavior of my topology. But every time my topology behaves abnormally, 
this message happens to show up in the log.

Any help or suggestion is highly appreciated.

Thanks,
Li Wang.




Re: Integration between python and java in storm

2016-12-16 Thread Li Wang
Storm has a shell bolt to process tuples in python. This might be what you 
need. 


Sent from my iPhone

> On 17 Dec 2016, at 7:35 AM, sam mohel  wrote:
> 
> I have a project using storm written in JAVA . i want to exchange the
> algorithm that the project used it to another BUT the other algorithm
> written in Python .
> 
> Can i do this exchange ? or Should i re-write whole project with python ?
> 
> Any Help will be appreciating .. Thanks


Re: Spout failures too high

2016-12-09 Thread Li Wang
Hi Pradeep,

Your config looks fine to me. Could you please check the log and see what 
causes the failure?

Regards,
Li


Sent from my iPhone

> On 10 Dec 2016, at 11:26 AM, pradeep s  wrote:
> 
> Hi,
> We are running a 5 node cluster(2x large ec2) with below config. Topology
> is consuming message from SQS and writing to RDS and S3. Even if there are
> no bolt failures , we are seeing many spout failures.
> Can you please help in checking the config.Also i am setting tasks as
> parallelism count * 2. Is this fine?
> 
> TOPOLOGY_NAME=MDP_STORM_PRD
> MARIA_BOLT_PARALLELISM=50
> S3_BOLT_PARALLELISM=50
> SQS_DELETE_BOLT_PARALLELISM=100
> SPOUT_PARALLELISM=50
> NUMBER_OF_WORKERS=5
> NUMBER_OF_ACKERS=5
> SPOUT_MAX_PENDING=5000
> MESSAGE_TIMEOUT_SECONDS=240
> 
> 
> Topology Code
> =
> 
> Config config = new Config();
> config.setNumWorkers(numWorkers);
> config.setDebug(false);
> config.setNumAckers(numAckers);
> config.setMaxSpoutPending(maxSpoutPending);
> config.setMessageTimeoutSecs(messageTimeoutSecs);
> 
> 
> topologyBuilder.setSpout(spoutId, new
> SQSMessageReaderSpout(sqsUtils.getSQSUrl(dataQueue), properties),
>spoutParallelism).*setNumTasks(spoutParallelism * TWO);*
> topologyBuilder.setBolt(mariaBoltId, new MariaDbBolt(properties),
> mariaBoltParallelism)
>.*setNumTasks(mariaBoltParallelism * TWO)*.fieldsGrouping(spoutId,
> new Fields(MESSAGE_ID));
> 
> topologyBuilder.setBolt(s3BoltId, new S3WriteBolt(properties,
> s3Properties), s3BoltParallelism)
>.*setNumTasks(s3BoltParallelism * TWO).*
> shuffleGrouping(mariaBoltId);
> 
> topologyBuilder
>.setBolt(sqsDeleteBoltId, new
> SQSMessageDeleteBolt(sqsUtils.getSQSUrl(dataQueue)), sqsBoltParallelism)
>.*setNumTasks(sqsBoltParallelism * TWO)*.shuffleGrouping(s3BoltId);
> 
> StormSubmitter.submitTopology(topologyName, config,
> topologyBuilder.createTopology());
> 
> 
> Regards
> 
> Pradeep S


How does the control flow in a Trident Topology work?

2016-11-02 Thread Li Wang
Hi guys,

I am trying to understand the implementation of Trident. Through reading the 
code in TridentTopolgyBuilder.java, I understand that some Coordinator 
components, such as MasterBatchCoordinator and TridentSpoutCoordinator, are 
added to a user defined topology in TridentTopologyBuilder.createTopology(). I 
try to understand the control flow of those coordinators, but is seems to be 
very difficult to get the sense just from source code. Is there any document 
giving a high level of the control flow of the coordinator components in a 
Trident Topology?

Any help is highly appreciated. Thanks!

Sincerely,
Li Wang

Re: Topology submision Bugs

2016-07-06 Thread Li Wang
Hi WA,

You must specify the hostname on the nimbus node with IP address that can
be accessed on the supervisor nodes rather than 127.0.0.1.

For instance, we assume the host name of nimbus node is nimbus_node and its
ip address is 192.168.1.4. Then your /etc/hosts files on both the nimbus
node and the supervisor nodes should contain:
 nimbus_node 192.168.1.4

If the problem still cannot be solved, please attach your /etc/hosts on
your nimbus node and any supervisor node.

Thanks,
Li


On 5 July 2016 at 17:05, Walid Aljoby 
wrote:

> Thank you.
> I used IP address directly but the problem  still the same.
> Regards,
> WA
>
>   From: Gmail 
>  To: dev@storm.apache.org
> Cc: User 
>  Sent: Tuesday, July 5, 2016 7:05 AM
>  Subject: Re: Topology submision Bugs
>
> Hi
>
> Please make sure the host name wf-ubuntun is properly configured in
> /etc/hosts on the nodes.
>
> Hope this helps.
>
> Li
>
> Sent from my iPhone
>
> > On 5 Jul 2016, at 5:27 AM, Walid Aljoby 
> wrote:
> >
> > Hi all,
> > I am running Apache Storm 1.0.1 over clusters of two nodes (one is
> called Master which runs zookeeper and nimbus, the other one runs
> supervisor).I just configured the  storm.zookeeper.servers:  and
> nimbus.seeds: to be the IP address of Master node for both two machines.
> > However, when I submitted the topology through the Master node like
> this:  storm jar target/storm-starter-1.0.1.jar
> org.apache.storm.starter.WordCountTopology DemoI came across into the
> following Exceptions:
> > 790  [main] INFO  o.a.s.StormSubmitter - Generated ZooKeeper secret
> payload for MD5-digest: -6025945650128968937:-8258756050733197469
> > 866  [main] INFO  o.a.s.s.a.AuthUtils - Got AutoCreds []
> > Exception in thread "main" java.lang.RuntimeException:
> org.apache.storm.thrift.transport.TTransportException:
> java.net.UnknownHostException: wf-ubuntun
> >at
> org.apache.storm.security.auth.TBackoffConnect.retryNext(TBackoffConnect.java:64)
> >at
> org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:56)
> >at
> org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:99)
> >at
> org.apache.storm.security.auth.ThriftClient.(ThriftClient.java:69)
> >at org.apache.storm.utils.NimbusClient.(NimbusClient.java:106)
> >at
> org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:78)
> >at
> org.apache.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:228)
> >at
> org.apache.storm.StormSubmitter.submitTopology(StormSubmitter.java:288)
> >at
> org.apache.storm.StormSubmitter.submitTopologyWithProgressBar(StormSubmitter.java:324)
> >at
> org.apache.storm.StormSubmitter.submitTopologyWithProgressBar(StormSubmitter.java:305)
> >at
> org.apache.storm.starter.WordCountTopology.main(WordCountTopology.java:93)
> > Caused by: org.apache.storm.thrift.transport.TTransportException:
> java.net.UnknownHostException: wf-ubuntun
> >at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:226)
> >at
> org.apache.storm.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
> >at
> org.apache.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:103)
> >at
> org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:53)
> >... 9 more
> > Caused by: java.net.UnknownHostException: wf-ubuntun
> >at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178)
> >at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> >at java.net.Socket.connect(Socket.java:579)
> >at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:221)
> >... 12 more
> >
> >
> >
> >
> > Please I hope anyone can help, and thank you in advance..
> >
> > Best Regards---
> > WA
>
>
>


Re: Different topologies need different schedulers

2016-03-14 Thread Li Wang
Hi,
Under current scheduler implementation, I am afraid this can not be achieved. 
Would you please specify the reason why you want different topologies to be 
scheduled by different schedulers. 

Thanks.
Li


> On 14 Mar 2016, at 10:51 AM, devopts  wrote:
> 
> Hi all,
>I think it is a issue that all topologies only is scheduled by the same 
> scheduler.
> And it doesn't work when i set the "storm.scheduler" value in my 
> topology,such as 
> config.put(Config.STORM_SCHEDULE,xxx.xxx).
> Now ,different topologies needs to be scheduled by different schedulers.
> And how to solve the problem?



[jira] [Created] (STORM-1057) Add throughput metric to spout/bolt and display them on web ui

2015-09-20 Thread Li Wang (JIRA)
Li Wang created STORM-1057:
--

 Summary: Add throughput metric to spout/bolt and display them on 
web ui
 Key: STORM-1057
 URL: https://issues.apache.org/jira/browse/STORM-1057
 Project: Apache Storm
  Issue Type: New Feature
Reporter: Li Wang
Assignee: Li Wang


Throughput is a fundamental metric to reasoning about the performance 
bottleneck of a topology. Displaying the throughputs of components and tasks on 
the web ui could greatly facilitate the user identifying the performance 
bottleneck and checking whether the the workload among components and tasks are 
balanced. 

What to do:
1. Measure the throughput of each spout/bolt.
2. Display the throughput metrics on web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (STORM-1007) Add more metrics to DisruptorQueue

2015-09-17 Thread Li Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804850#comment-14804850
 ] 

Li Wang edited comment on STORM-1007 at 9/18/15 2:08 AM:
-

Thanks for your reviewing and merging the code. It's my pleasure to contribute 
to Storm.


was (Author: wangli1426):
Thanks for your reviewing the merging the code. It's my pleasure to contribute 
to Storm.

> Add more metrics to DisruptorQueue
> --
>
> Key: STORM-1007
> URL: https://issues.apache.org/jira/browse/STORM-1007
> Project: Apache Storm
>  Issue Type: New Feature
>Reporter: Li Wang
>Assignee: Li Wang
> Fix For: 0.11.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The metrics of the queues for each component are very helpful to reason about 
> the performance issues of a topology.
> For instance, for the applications with strong time constraint (e.g., threat 
> detection), if the elements in the input queue of a bolt have a long sojourn 
> time, it indicates that the user should increase the parallelism of that bolt 
> to reduce the processing delay.
> However, the metrics on the DisruptorQueue currently available are limited. 
> More useful metrics, such as average sojourn time and average throughput are 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1007) Add more metrics to DisruptorQueue

2015-09-17 Thread Li Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804850#comment-14804850
 ] 

Li Wang commented on STORM-1007:


Thanks for your reviewing the merging the code. It's my pleasure to contribute 
to Storm.

> Add more metrics to DisruptorQueue
> --
>
> Key: STORM-1007
> URL: https://issues.apache.org/jira/browse/STORM-1007
> Project: Apache Storm
>  Issue Type: New Feature
>Reporter: Li Wang
>Assignee: Li Wang
> Fix For: 0.11.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The metrics of the queues for each component are very helpful to reason about 
> the performance issues of a topology.
> For instance, for the applications with strong time constraint (e.g., threat 
> detection), if the elements in the input queue of a bolt have a long sojourn 
> time, it indicates that the user should increase the parallelism of that bolt 
> to reduce the processing delay.
> However, the metrics on the DisruptorQueue currently available are limited. 
> More useful metrics, such as average sojourn time and average throughput are 
> expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (STORM-1008) Isolate the code for metric collection and retrieval from the DisruptorQueue

2015-08-25 Thread Li Wang (JIRA)
Li Wang created STORM-1008:
--

 Summary: Isolate the code for metric collection and retrieval from 
the DisruptorQueue
 Key: STORM-1008
 URL: https://issues.apache.org/jira/browse/STORM-1008
 Project: Apache Storm
  Issue Type: Sub-task
Reporter: Li Wang
Assignee: Li Wang
 Fix For: 0.11.0


The code for collecting and retrieving the metrics of the queue is mixed with 
DisruptorQueue. 

It is better to isolate metrics collection and retrieval from the 
implementation of DisruptorQueue, for better extendability and readability of 
code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (STORM-1007) Add more metrics to DisruptorQueue

2015-08-25 Thread Li Wang (JIRA)
Li Wang created STORM-1007:
--

 Summary: Add more metrics to DisruptorQueue
 Key: STORM-1007
 URL: https://issues.apache.org/jira/browse/STORM-1007
 Project: Apache Storm
  Issue Type: Bug
Reporter: Li Wang
Assignee: Li Wang


The metrics of the queues for each component are very helpful to reason about 
the performance issues of a topology.

For instance, for the applications with strong time constraint (e.g., threat 
detection), if the elements in the input queue of a bolt have a long sojourn 
time, it indicates that the user should increase the parallelism of that bolt 
to reduce the processing delay.

However, the metrics on the DisruptorQueue currently available are limited. 
More useful metrics, such as average sojourn time and average throughput are 
expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-992) A bug in the timer.clj might cause unexpected delay to schedule new event

2015-08-23 Thread Li Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708393#comment-14708393
 ] 

Li Wang commented on STORM-992:
---

This bug has been fixed at https://github.com/apache/storm/pull/680, waiting 
for being merged.
I believe this fix could prevent potential bugs from happening. So please merge 
it.

> A bug in the timer.clj might cause unexpected delay to schedule new event
> -
>
> Key: STORM-992
> URL: https://issues.apache.org/jira/browse/STORM-992
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Li Wang
>Assignee: Li Wang
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> The timer thread calculates the delay to schedule the head of the queue and 
> sleeps accordingly. However, if a new event with high priority is inserted at 
> the head of the queue during the sleep, this event cannot be scheduled in 
> time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (STORM-992) A bug in the timer.clj might cause unexpected delay to schedule new event

2015-08-13 Thread Li Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Wang resolved STORM-992.
---
Resolution: Fixed

This bug has been fixed at https://github.com/apache/storm/pull/680

Please consider to merging it.

Thanks.

> A bug in the timer.clj might cause unexpected delay to schedule new event
> -
>
> Key: STORM-992
> URL: https://issues.apache.org/jira/browse/STORM-992
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Li Wang
>    Assignee: Li Wang
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> The timer thread calculates the delay to schedule the head of the queue and 
> sleeps accordingly. However, if a new event with high priority is inserted at 
> the head of the queue during the sleep, this event cannot be scheduled in 
> time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (STORM-992) A bug in the timer.clj might cause unexpected delay to schedule new event

2015-08-13 Thread Li Wang (JIRA)
Li Wang created STORM-992:
-

 Summary: A bug in the timer.clj might cause unexpected delay to 
schedule new event
 Key: STORM-992
 URL: https://issues.apache.org/jira/browse/STORM-992
 Project: Apache Storm
  Issue Type: Bug
Reporter: Li Wang
Assignee: Li Wang


The timer thread calculates the delay to schedule the head of the queue and 
sleeps accordingly. However, if a new event with high priority is inserted at 
the head of the queue during the sleep, this event cannot be scheduled in time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)