Silly question… 

When thinking about a PhD thesis… do you want to tie it to a specific 
technology or do you want to investigate an idea but then use a specific 
technology. 
Or is this an outdated way of thinking? 

"I am doing my PHD thesis on large scale machine learning e.g  Online learning, 
batch and mini batch learning.”

So before we look at technologies like Spark… could the OP break down a more 
specific concept or idea that he wants to pursue? 

Looking at what Jorn said… 

Using machine learning to better predict workloads in terms of managing 
clusters… This could be interesting… but is it enough for a PhD thesis, or of 
interest to the OP? 


> On Jul 15, 2015, at 9:43 AM, Jörn Franke <jornfra...@gmail.com> wrote:
> 
> Well one of the strength of spark is standardized general distributed 
> processing allowing many different types of processing, such as graph 
> processing, stream processing etc. The limitation is that it is less 
> performant than one system focusing only on one type of processing (eg graph 
> processing). I miss - and this may not be spark specific - some artificial 
> intelligence to manage a cluster, e.g. Predicting workloads, how long a job 
> may run based on previously executed similar jobs etc. Furthermore, many 
> optimizations you have do to manually, e.g. Bloom filters, partitioning etc - 
> if you find here as well some intelligence that does this automatically based 
> on previously executed jobs taking into account that optimizations themselves 
> change over time would be great... You may also explore feature interaction
> 
> Le mar. 14 juil. 2015 à 7:19, Shashidhar Rao <raoshashidhar...@gmail.com 
> <mailto:raoshashidhar...@gmail.com>> a écrit :
> Hi,
> 
> I am doing my PHD thesis on large scale machine learning e.g  Online 
> learning, batch and mini batch learning.
> 
> Could somebody help me with ideas especially in the context of Spark and to 
> the above learning methods. 
> 
> Some ideas like improvement to existing algorithms, implementing new features 
> especially the above learning methods and algorithms that have not been 
> implemented etc.
> 
> If somebody could help me with some ideas it would really accelerate my work.
> 
> Plus few ideas on research papers regarding Spark or Mahout.
> 
> Thanks in advance.
> 
> Regards 


Reply via email to