Sorry Guys! I mistakenly added my question to this thread( Research ideas using spark). Moreover people can ask any question , this spark user group is for that.
Cheers! 😊 On Wed, Jul 15, 2015 at 9:43 PM, Robin East <robin.e...@xense.co.uk> wrote: > Well said Will. I would add that you might want to investigate GraphChi > which claims to be able to run a number of large-scale graph processing > tasks on a workstation much quicker than a very large Hadoop cluster. It > would be interesting to know how widely applicable the approach GraphChi > takes and what implications it has for parallel/distributed computing > approaches. A rich seam to mine indeed. > > Robin > > On 15 Jul 2015, at 14:48, William Temperley <willtemper...@gmail.com> > wrote: > > There seems to be a bit of confusion here - the OP (doing the PhD) had the > thread hijacked by someone with a similar name asking a mundane question. > > It would be a shame to send someone away so rudely, who may do valuable > work on Spark. > > Sashidar (not Sashid!) I'm personally interested in running graph > algorithms for image segmentation using MLib and Spark. I've got many > questions though - like is it even going to give me a speed-up? ( > http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html) > > It's not obvious to me which classes of graph algorithms can be > implemented correctly and efficiently in a highly parallel manner. There's > tons of work to be done here, I'm sure. Also, look at parallel geospatial > algorithms - there's a lot of work being done on this. > > Best, Will > > > > On 15 July 2015 at 09:01, Vineel Yalamarthy <vineelyalamar...@gmail.com> > wrote: > >> Hi Daniel >> >> Well said >> >> Regards >> Vineel >> >> On Tue, Jul 14, 2015, 6:11 AM Daniel Darabos < >> daniel.dara...@lynxanalytics.com> wrote: >> >>> Hi Shahid, >>> To be honest I think this question is better suited for Stack Overflow >>> than for a PhD thesis. >>> >>> On Tue, Jul 14, 2015 at 7:42 AM, shahid ashraf <sha...@trialx.com> >>> wrote: >>> >>>> hi >>>> >>>> I have a 10 node cluster i loaded the data onto hdfs, so the no. of >>>> partitions i get is 9. I am running a spark application , it gets stuck on >>>> one of tasks, looking at the UI it seems application is not using all nodes >>>> to do calculations. attached is the screen shot of tasks, it seems tasks >>>> are put on each node more then once. looking at tasks 8 tasks get completed >>>> under 7-8 minutes and one task takes around 30 minutes so causing the delay >>>> in results. >>>> >>>> >>>> On Tue, Jul 14, 2015 at 10:48 AM, Shashidhar Rao < >>>> raoshashidhar...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am doing my PHD thesis on large scale machine learning e.g Online >>>>> learning, batch and mini batch learning. >>>>> >>>>> Could somebody help me with ideas especially in the context of Spark >>>>> and to the above learning methods. >>>>> >>>>> Some ideas like improvement to existing algorithms, implementing new >>>>> features especially the above learning methods and algorithms that have >>>>> not >>>>> been implemented etc. >>>>> >>>>> If somebody could help me with some ideas it would really accelerate >>>>> my work. >>>>> >>>>> Plus few ideas on research papers regarding Spark or Mahout. >>>>> >>>>> Thanks in advance. >>>>> >>>>> Regards >>>>> >>>> >>>> >>>> >>>> -- >>>> with Regards >>>> Shahid Ashraf >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>> >>> > > -- with Regards Shahid Ashraf