Look at this : http://www.forbes.com/sites/lisabrownlee/2015/07/10/the-11-trillion-internet-of-things-big-data-and-pattern-of-life-pol-analytics/
On Wed, Jul 15, 2015 at 10:19 PM shahid ashraf <sha...@trialx.com> wrote: > Sorry Guys! > > I mistakenly added my question to this thread( Research ideas using > spark). Moreover people can ask any question , this spark user group is for > that. > > Cheers! > 😊 > > On Wed, Jul 15, 2015 at 9:43 PM, Robin East <robin.e...@xense.co.uk> > wrote: > >> Well said Will. I would add that you might want to investigate GraphChi >> which claims to be able to run a number of large-scale graph processing >> tasks on a workstation much quicker than a very large Hadoop cluster. It >> would be interesting to know how widely applicable the approach GraphChi >> takes and what implications it has for parallel/distributed computing >> approaches. A rich seam to mine indeed. >> >> Robin >> >> On 15 Jul 2015, at 14:48, William Temperley <willtemper...@gmail.com> >> wrote: >> >> There seems to be a bit of confusion here - the OP (doing the PhD) had >> the thread hijacked by someone with a similar name asking a mundane >> question. >> >> It would be a shame to send someone away so rudely, who may do valuable >> work on Spark. >> >> Sashidar (not Sashid!) I'm personally interested in running graph >> algorithms for image segmentation using MLib and Spark. I've got many >> questions though - like is it even going to give me a speed-up? ( >> http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html) >> >> It's not obvious to me which classes of graph algorithms can be >> implemented correctly and efficiently in a highly parallel manner. There's >> tons of work to be done here, I'm sure. Also, look at parallel geospatial >> algorithms - there's a lot of work being done on this. >> >> Best, Will >> >> >> >> On 15 July 2015 at 09:01, Vineel Yalamarthy <vineelyalamar...@gmail.com> >> wrote: >> >>> Hi Daniel >>> >>> Well said >>> >>> Regards >>> Vineel >>> >>> On Tue, Jul 14, 2015, 6:11 AM Daniel Darabos < >>> daniel.dara...@lynxanalytics.com> wrote: >>> >>>> Hi Shahid, >>>> To be honest I think this question is better suited for Stack Overflow >>>> than for a PhD thesis. >>>> >>>> On Tue, Jul 14, 2015 at 7:42 AM, shahid ashraf <sha...@trialx.com> >>>> wrote: >>>> >>>>> hi >>>>> >>>>> I have a 10 node cluster i loaded the data onto hdfs, so the no. of >>>>> partitions i get is 9. I am running a spark application , it gets stuck on >>>>> one of tasks, looking at the UI it seems application is not using all >>>>> nodes >>>>> to do calculations. attached is the screen shot of tasks, it seems tasks >>>>> are put on each node more then once. looking at tasks 8 tasks get >>>>> completed >>>>> under 7-8 minutes and one task takes around 30 minutes so causing the >>>>> delay >>>>> in results. >>>>> >>>>> >>>>> On Tue, Jul 14, 2015 at 10:48 AM, Shashidhar Rao < >>>>> raoshashidhar...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am doing my PHD thesis on large scale machine learning e.g Online >>>>>> learning, batch and mini batch learning. >>>>>> >>>>>> Could somebody help me with ideas especially in the context of Spark >>>>>> and to the above learning methods. >>>>>> >>>>>> Some ideas like improvement to existing algorithms, implementing new >>>>>> features especially the above learning methods and algorithms that have >>>>>> not >>>>>> been implemented etc. >>>>>> >>>>>> If somebody could help me with some ideas it would really accelerate >>>>>> my work. >>>>>> >>>>>> Plus few ideas on research papers regarding Spark or Mahout. >>>>>> >>>>>> Thanks in advance. >>>>>> >>>>>> Regards >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> with Regards >>>>> Shahid Ashraf >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>> >>>> >>>> >> >> > > > -- > with Regards > Shahid Ashraf >