Re: Storm Topology - Paralellism Help

numan goceri Thu, 12 May 2016 09:39:24 -0700

I've set in the config 2 Workers (each one is related to a port - 
supervisor.slots.ports: [6700, 6701]). I also tried to involve 2 more 
additional ports but in all cases I've noticed that somehow only 1 port 
(worker) has been used according to Storm UI. Why is that is also a question.


For my spout and 2 bolts, I set different number of parallelism_hints to see 
the reaction of the topology. (started from 2 until 100).. I'm either having 
"unable to create new native thread", or the hbase connection error.
As you said, the best would be to find out why so many threads are created and 
how to close these threads.It should be independent from the amount of tuples 
that I'm sending via kafka producer, shoudn't it?
 --- Numan Göceri 

    On Thursday, May 12, 2016 4:26 PM, Bobby Evans <[email protected]> wrote:
 

 max user processes              (-u) 47555 is the limit that is probably being 
hit.  So you probably have 47555+ threads in execution on that host.
I honestly don't know where all of the threads are coming from.  Storm will 
create two threads for each executor, and a small fixed number more for the 
system in each worker.  By increasing the parallelism hint of your bolts/spouts 
and increasing the number of workers the number of threads will go up and linux 
treats threads against your max user processes limit.
Most of the time when I run into a situation like this I try to get a jstack or 
a heap dump of the workers to try and understand what all of the threads are 
associated with.  My guess is that something is happening with HBase, like you 
said, where the client is not shutting down all of it's threads when you close 
the table. 
 - Bobby 

    On Thursday, May 12, 2016 9:00 AM, numan goceri <[email protected]> 
wrote:
 

 Hi Bobby,
Thanks for your reply. I've searched a lot on the net, what OutOfMemoryError 
actually means. And it seems like an OS resource problem.Link#1: 
http://stackoverflow.com/questions/16789288/java-lang-outofmemoryerror-unable-to-create-new-native-threadLink#2:
 https://plumbr.eu/outofmemoryerror/unable-to-create-new-native-thread
I'm running this on my VM-Player and so is the ulimit 
configurations:[root@my_Project]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 47555
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 47555
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
[root@my_Project]#

I'm guessing Storm is not killing any threads, which have been opened so far.
if it's opening a new thread, each time when I want to "put" a result into the 
hbase, then it should kill, since I call "close" afterwards.
My code is basically like followings:
prepare(){    this.connection = 
ConnectionFactory.createConnection(constructConfiguration());            
    this.allTuplesTable = connection.getTable("allTuples");
    this.resultsTable = connection.getTable("results");
}
execute(){        readFromAllTuples();    
    ... 
    this.resultsTable.put(put);    this.resultsTable.close();}cleanup(){    
this.allTuplesTable.close();    this.resultsTable.close();}

I'd appreciate if you could show me a way to continue.. the data, I'm dealing 
with, should be not soo "big data" for Storm.


 --- Numan Göceri 

    On Thursday, May 12, 2016 3:31 PM, Bobby Evans <[email protected]> wrote:
 

 Numan,
Please read up a bit on what the OutOfMemeroyError actually means.  

https://duckduckgo.com/?q=java.lang.OutOfMemoryError%3A+unable+to+create+new+native+thread&t=ffab

Most of the time on Linux it means that you have hit a ulimit on the number of 
processes/threads that your headless user can handle.  By increasing the 
parallelism you did not make the problem go away, you just created a new 
problem, and when you fixed that problem the original problem came back.  You 
either need to increase the ulimit (possibly both hard and soft) for the user 
your worker processes are running as, or you need to find a way to reduce the 
number of threads that you are using.  
 - Bobby 

    On Thursday, May 12, 2016 8:07 AM, numan goceri 
<[email protected]> wrote:
 

 I have implemented a topology by using 1 Spout and 2 Bolts:

|Spout|->|Bolt1|->|Bolt2|

Kafka producer pushes the input rows out of a csv file (around 2000 rows) 
currently and Spout is receiving them all in once.
Bolt1: writes all incoming tuples into an Hbase Table (htable_allTuples)
Bolt2: checks all incoming tuples and once the expected tuple arrived, then it 
reads other related tuples from "htable_allTuples" and writes the results into 
another hbase table (htable_result)

In my Bolt2, if the conditions are so many that finally I have 2 rows to write 
into the result table, then it all works very fine.
But if in my Bolt2, if I reduce the conditions, so that there will be more than 
2 rows as a result (like around 18 rows), then my Bolt2 throws the following 
error:
"java.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.OutOfMemoryError: unable to create new native thread at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor.."

The solution for this problem as I've realized is parallelism: using more 
workers, executors and/or tasks.
When I increase the number of executors and workers, then I receive the 
following hbase connection error:
  "ERROR [main] zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 4 
attempts."
  "baseZNode=/hbase-unsecure Unable to set watcher on znode 
(/hbase-unsecure/hbaseid)"

So I found out that I should increase the value of "maxClientCnxns" parameter. 
(by default: 300). I've first set it to 3000 and I still receive the same error 
and then I've set to 0, which means: no client connection limit between hbase 
and zookeeper.

This time I receive again my old error message: "java.lang.OutOfMemoryError: 
unable to create new native thread".

I open the hbase table connections once in "prepare" method and close them all 
in "cleanup" method.
Whenever I call <hbaseTable>.put(..) method, I call also <hbaseTable>.close(); 
but still it seems like there are lots of threads runnning in background.

Do you have any idea how to get rid of these two problems? and how to set a 
clean topology?

Thanks in advance for the feedbacks. --- Numan Göceri

Re: Storm Topology - Paralellism Help

Reply via email to