Ok… 

After having some off-line exchanges with Shashidhar Rao came up with an idea…

Apply machine learning to either implement or improve autoscaling up or down 
within a Storm/Akka cluster. 

While I don’t know what constitutes an acceptable PhD thesis, or senior project 
for undergrads… this is a real life problem that actually has some real value. 

First, storm doesn’t scale down.  Unless there’s been some improvements in the 
last year, you really can’t easily scale down the number of workers and 
transfer state to another worker. 
Looking at Akka, that would be an easier task because of the actor model. 
However, I don’t know Akka that well, so I can’t say if this is already 
implemented. 

So besides the mechanism to scale (up and down), you then have the issue of 
machine learning in terms of load and how to properly scale. 
This could be as simple as a PID function that watches the queues between 
spout/bolts and bolt/bolt, or something more advanced. This is where the 
research part of the project comes in. (What do you monitor, and how do you 
calculate and determine when to scale up or down, weighing in the cost(s) of 
the action of scaling.) 

Again its a worthwhile project, something that actually has business value, 
especially in terms of Lambda and other groovy greek lettered names for cluster 
designs (Zeta? ;-) ) 
Where you have both M/R (computational) and subjective real time (including 
micro batch) occurring either on the same cluster or within the same DC 
infrastructure. 


Again I don’t know if this is worthy of a PhD thesis, Masters Thesis, or Senior 
Project, but it is something that one could sink one’s teeth into and 
potentially lead to a commercial grade project if done properly. 

Good luck with it.

HTH 

-Mike




> On Jul 15, 2015, at 12:40 PM, vaquar khan <vaquar.k...@gmail.com> wrote:
> 
> I would suggest study spark ,flink,strom and based on your understanding and 
> finding prepare your research paper.
> 
> May be you will invented new spark ☺
> 
> Regards, 
> Vaquar khan
> 
> On 16 Jul 2015 00:47, "Michael Segel" <msegel_had...@hotmail.com 
> <mailto:msegel_had...@hotmail.com>> wrote:
> Silly question… 
> 
> When thinking about a PhD thesis… do you want to tie it to a specific 
> technology or do you want to investigate an idea but then use a specific 
> technology. 
> Or is this an outdated way of thinking? 
> 
> "I am doing my PHD thesis on large scale machine learning e.g  Online 
> learning, batch and mini batch learning.”
> 
> So before we look at technologies like Spark… could the OP break down a more 
> specific concept or idea that he wants to pursue? 
> 
> Looking at what Jorn said… 
> 
> Using machine learning to better predict workloads in terms of managing 
> clusters… This could be interesting… but is it enough for a PhD thesis, or of 
> interest to the OP? 
> 
> 
>> On Jul 15, 2015, at 9:43 AM, Jörn Franke <jornfra...@gmail.com 
>> <mailto:jornfra...@gmail.com>> wrote:
>> 
>> Well one of the strength of spark is standardized general distributed 
>> processing allowing many different types of processing, such as graph 
>> processing, stream processing etc. The limitation is that it is less 
>> performant than one system focusing only on one type of processing (eg graph 
>> processing). I miss - and this may not be spark specific - some artificial 
>> intelligence to manage a cluster, e.g. Predicting workloads, how long a job 
>> may run based on previously executed similar jobs etc. Furthermore, many 
>> optimizations you have do to manually, e.g. Bloom filters, partitioning etc 
>> - if you find here as well some intelligence that does this automatically 
>> based on previously executed jobs taking into account that optimizations 
>> themselves change over time would be great... You may also explore feature 
>> interaction
>> 
>> Le mar. 14 juil. 2015 à 7:19, Shashidhar Rao <raoshashidhar...@gmail.com 
>> <mailto:raoshashidhar...@gmail.com>> a écrit :
>> Hi,
>> 
>> I am doing my PHD thesis on large scale machine learning e.g  Online 
>> learning, batch and mini batch learning.
>> 
>> Could somebody help me with ideas especially in the context of Spark and to 
>> the above learning methods. 
>> 
>> Some ideas like improvement to existing algorithms, implementing new 
>> features especially the above learning methods and algorithms that have not 
>> been implemented etc.
>> 
>> If somebody could help me with some ideas it would really accelerate my work.
>> 
>> Plus few ideas on research papers regarding Spark or Mahout.
>> 
>> Thanks in advance.
>> 
>> Regards 
> 
> 

Reply via email to