Re: Research ideas using spark

2015-07-16 Thread Michael Segel
Ok… After having some off-line exchanges with Shashidhar Rao came up with an idea… Apply machine learning to either implement or improve autoscaling up or down within a Storm/Akka cluster. While I don’t know what constitutes an acceptable PhD thesis, or senior project for undergrads… this

Re: Research ideas using spark

2015-07-15 Thread Vineel Yalamarthy
Hi Daniel Well said Regards Vineel On Tue, Jul 14, 2015, 6:11 AM Daniel Darabos daniel.dara...@lynxanalytics.com wrote: Hi Shahid, To be honest I think this question is better suited for Stack Overflow than for a PhD thesis. On Tue, Jul 14, 2015 at 7:42 AM, shahid ashraf

Re: Research ideas using spark

2015-07-15 Thread Akhil Das
Try to repartition it to a higher number (at least 3-4 times the total # of cpu cores). What operation are you doing? It may happen that if you are doing a join/groupBy sort of operation that task which is taking time is having all the values, in that case you need to use a Partitioner which will

Re: Research ideas using spark

2015-07-15 Thread shahid ashraf
Sorry Guys! I mistakenly added my question to this thread( Research ideas using spark). Moreover people can ask any question , this spark user group is for that. Cheers!  On Wed, Jul 15, 2015 at 9:43 PM, Robin East robin.e...@xense.co.uk wrote: Well said Will. I would add that you might want

Re: Research ideas using spark

2015-07-15 Thread Ravindra
Look at this : http://www.forbes.com/sites/lisabrownlee/2015/07/10/the-11-trillion-internet-of-things-big-data-and-pattern-of-life-pol-analytics/ On Wed, Jul 15, 2015 at 10:19 PM shahid ashraf sha...@trialx.com wrote: Sorry Guys! I mistakenly added my question to this thread( Research ideas

Re: Research ideas using spark

2015-07-15 Thread Michael Segel
Silly question… When thinking about a PhD thesis… do you want to tie it to a specific technology or do you want to investigate an idea but then use a specific technology. Or is this an outdated way of thinking? I am doing my PHD thesis on large scale machine learning e.g Online learning,

Re: Research ideas using spark

2015-07-15 Thread vaquar khan
I would suggest study spark ,flink,strom and based on your understanding and finding prepare your research paper. May be you will invented new spark ☺ Regards, Vaquar khan On 16 Jul 2015 00:47, Michael Segel msegel_had...@hotmail.com wrote: Silly question… When thinking about a PhD thesis…

Re: Research ideas using spark

2015-07-15 Thread Jörn Franke
Well one of the strength of spark is standardized general distributed processing allowing many different types of processing, such as graph processing, stream processing etc. The limitation is that it is less performant than one system focusing only on one type of processing (eg graph processing).

Re: Research ideas using spark

2015-07-15 Thread Robin East
Well said Will. I would add that you might want to investigate GraphChi which claims to be able to run a number of large-scale graph processing tasks on a workstation much quicker than a very large Hadoop cluster. It would be interesting to know how widely applicable the approach GraphChi takes

Re: Research ideas using spark

2015-07-15 Thread William Temperley
There seems to be a bit of confusion here - the OP (doing the PhD) had the thread hijacked by someone with a similar name asking a mundane question. It would be a shame to send someone away so rudely, who may do valuable work on Spark. Sashidar (not Sashid!) I'm personally interested in running

Re: Research ideas using spark

2015-07-14 Thread Daniel Darabos
Hi Shahid, To be honest I think this question is better suited for Stack Overflow than for a PhD thesis. On Tue, Jul 14, 2015 at 7:42 AM, shahid ashraf sha...@trialx.com wrote: hi I have a 10 node cluster i loaded the data onto hdfs, so the no. of partitions i get is 9. I am running a spark

Research ideas using spark

2015-07-13 Thread Shashidhar Rao
Hi, I am doing my PHD thesis on large scale machine learning e.g Online learning, batch and mini batch learning. Could somebody help me with ideas especially in the context of Spark and to the above learning methods. Some ideas like improvement to existing algorithms, implementing new features