Hi Chamila, This might sound bit dramatic...but what if we write Cartridge Agent in Go Lang? Given the short comings of Python we have had, I'd say it's worth a shot.
Thanks. On Sat, Dec 12, 2015 at 1:34 AM, Chamila De Alwis <chami...@wso2.com> wrote: > Hi devs, > > During the testing of the Python Cartridge Agent in the past few weeks we > were able to identify a few points where performance and functionality > could be improved. > > *1 - Agent's thread utilization * > It was observed that when the agent is kept running for long periods of > time, the thread count spawned would go up and reach the thread limit > giving the following error. > > Exception in thread Thread-85: > Traceback (most recent call last): > File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner > self.run() > File > "/mnt/apache-stratos-python-cartridge-agent-4.1.4/modules/util/asyncscheduledtask.py", > line 65, in run > task_thread.start() > File "/usr/lib/python2.7/threading.py", line 745, in start > _start_new_thread(self.__bootstrap, ()) > error: can't start new thread > > Although I couldn't pinpoint the exact use case which results in this > sudden spike in the thread count, it's most likely caused by the > MessageBrokerHeartBeatChecker class inside subscriber.py file. This is a > periodic job which checks for the liveliness of the connected message > broker. > > We can get rid of this by implementing a callback method of Paho Mqtt > Client's on_disconnect method [1]. > > Furthermore, the ScheduledExecutor class in the asyncscheduledtask.py > file, for each invocation it spawns a new thread. > > while not self.terminated: > time.sleep(self.delay) > task_thread = Thread(target=self.task.execute_task) > task_thread.start() > > > This is not a good practice. It should rather submit the task to a thread > pool. Imesh has an example code of the thread pool usage using the > pythonfutures library[2] for Python 2.7. > > *2. Decoupling the messaging implementation* > > When MQTT was chosen initially as the message broker protocol, the PCA was > designed to only use MQTT, using the Paho MQTT library[3]. However after > coming across several issues in MQTT, it was decided to default back to > AMQP as the message broker protocol for Stratos Components. Except for the > PCA, since at the time, there was no AMQP 1.0 supported Python library. > However, it seems that Apache QPid Proton library[4] provides AMQP 1.0 > compliant clients for message broker communication. > > We can provide a common interface for messaging and provide implementation > for MQTT and AMQP (if Apache QPid Proton proves to be helpful) protocols. > This way we can provide the selection of the protocol to the configuration, > similar to what we have in the Java based components such as AS, CC etc. > > Furthermore, this can greatly help if messaging has to be customized to > cater for custom protocols such as Apache Kafka[5]. It will only be another > implementation of the common interface. > > *3. Remove dependency on the Git Binary* > > Currently the PCA manipulates the Git based artifact repository through > the Git binary file. This is inefficient for several reasons. > > 1. Every Git command is executed as a separate process, created using > os.fork(). This duplicates memory and results in needless marshalling > and unmarshalling of input and output among the processes. > 2. If several commands are executed upon the Git binary at the same > time (ex: of a multi-tenant application), it can be a performance > bottleneck. > 3. This is not platform independent. > > Therefore it will greatly help if we can go for a Git library for Python > which doesn't depend on the Git binary. Dulwich [6] was considered in the > past, but then releases had more features TBD. However recentl releases > seems to have fixed a lot of bug reports and features. It also has plans > to be re licensed as Apache v2.0 which also would help us by making it > shippable. > > *4. Use of the maintenance mode* > > Sajith recently started a discussion on a patching strategy for the PCA > (Thread - [Discuss] Suggesting a patching model for PCA). Patching for > the PCA involves two different scenarios, offline and online. If instances > that are already online needs to be patched the current PCA or Stratos > design doesn't allow such a window. > > Maintenance Mode can be used to signal the Autoscaler that the member has > gone to maintenance and scaling decisions should not be taken on that > member. While in the maintenance mode, the running PCA is gracefully > shutdown and the patched PCA comes up. When it publishes InstanceActivated > event, the member will again be involved in the scaling decision process. > > There are few places that needs further development for this scenario to > work. > > 1. It seems that Stratos would mark a member as Pending Termination if > a maintenance mode event is received. > 2. The PCA doesn't support graceful shutdown right now. It doesn't > take any inputs from outside, and the threads spawned by it are not daemon > threads, which results in a process that needs to be killed because the > threads do not terminate when the main thread goes down. It can be designed > that an input to a periodically checked external file or somehow OS Signals > can set flags on the running process to terminate. > 3. The capability of a PCA to start on an already activated instance > isn't verified. Based on the status of the member that it resides, it > should/should not publish certain events. > > > *5. Update topology model based on the message broker events* > > Currently the topology is repopulated everytime the complete topology > event is received. This was done as a quick fix to update the topology > model. However for large complete topology events, building event objects > and updating contexts can be costly, and on the design front, that is not > the intended use of the complete topology event. The events should > dynamically update the topology status of the agent. For this to happen > another task should first be completed. > > *6. Verify message broker event classes* > > Currently all the message broker events are deserialized in to event > objects as per to the classes defined in modules/event package. However > this doesn't depend or track any changes done to the Stratos Messaging > Component, and therefore can quickly be outdated without a hint on the > build or the tests. The events have to be verified against their > counterparts in the messaging component. > > *7. Decouple log file publishing protocol* > > The PCA has a log publishing feature where a specified list of log files > will be monitored and entries published to a thrift data receiver. However > there can be situations where the server logs (ex: PCA's own log) has to be > monitored from outside (ex: monitoring the agent.log of a Docker container > inside Kubernetes using FluentD [7], or directly publishing agent.log to > ELK). For those situations the log publishing feature is not flexible > enough. If we can introduce a pluggable design for the log event publisher, > it can solve most of these situations with low effort. > > *8. Verify Python 3 compatibility* > > Python 2.7 is said to be supported until 2020 [8], so this is not a major > concern. However since the opinions between Python 2.7 and Python 3 can be > political, it might be good to verify and adjust for Python 3 > compatibility. The PCA was originally written with Python 2.7 in mind, > since it is still the most distributed version by default. > > These changes might not be immediately solved, however they can be > critical in tuning some rough spots in the PCA implementation. WDYT? Ideas? > > > [1] - http://www.hivemq.com/blog/mqtt-client-library-paho-python > [2] - https://github.com/agronholm/pythonfutures/ > [3] - https://eclipse.org/paho/clients/python/ > [4] - > https://qpid.apache.org/releases/qpid-proton-0.11.0/proton/python/book/tutorial.html > [5] - http://kafka.apache.org/documentation.html#gettingStarted > [6] - https://www.dulwich.io/ > [7] - > https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch/fluentd-es-image > [8] - https://hg.python.org/peps/rev/76d43e52d978 > > Regards, > Chamila de Alwis > Committer and PMC Member - Apache Stratos > Software Engineer | WSO2 | +94772207163 > Blog: code.chamiladealwis.com > > > -- Akila Ravihansa Perera WSO2 Inc.; http://wso2.com/ Blog: http://ravihansa3000.blogspot.com