Hi Danula, Any update on the progress? Were you managed to integrate the transformations with the wrangler?
Thanks, On Thu, Jul 2, 2015 at 11:38 AM, Danula Eranjith <hmdanu...@gmail.com> wrote: > Hi all, > > Update on the current progress of the project and future activities as we > discussed at the recent meeting. > > *Current Progress* > > I have completed the phase of creating spark transformations relevant to > operations available in wrangler. > > Operations implemented > - Fill > - Split > - Drop > - Delete > - Extract > > *Future activities* > > - Modify the wrangler interface to suit the current implementation > - Automate the process of generating Spark transformations > - Integrating wrangler to the ML workflow > > Thanks, > Danula > > On Sun, Jun 28, 2015 at 9:31 AM, Danula Eranjith <hmdanu...@gmail.com> > wrote: > >> Hi all, >> >> No, We haven't done a review yet. >> It would be great if we could have one so that I can discuss with you all >> and clarify the next steps of the implementation as you mentioned. >> >> Thanks >> Danula >> >> On Sun, Jun 28, 2015 at 9:25 AM, Supun Sethunga <sup...@wso2.com> wrote: >> >>> Hi Danula, >>> >>> Did we have a review for the work done so far? If not, shall we have a >>> one? We can clear out any doubts and issues as well.. >>> >>> Thanks, >>> Supun >>> >>> On Wed, Jun 24, 2015 at 6:42 AM, Nirmal Fernando <nir...@wso2.com> >>> wrote: >>> >>>> Hi Danula, >>>> >>>> Thanks for the update, keep them coming. >>>> >>>> On a JavaRDD you can perform a collect() to get a list, AFAIR. Yes, >>>> this is costly, since it would load whole dataset into memory. So, is this >>>> an operation which involves multiple rows? >>>> >>>> On Tue, Jun 23, 2015 at 2:15 PM, Danula Eranjith <hmdanu...@gmail.com> >>>> wrote: >>>> >>>>> Hi Supun, >>>>> >>>>> I modified the "Fill" operation to add what you mentioned. >>>>> >>>>> I used a workaround to to implement certain parts of the operations >>>>> such as filling with values from rows above and below. >>>>> I created a List Implementation using toArray() method in JavaRDD and >>>>> then converted it back to a JavaRDD after the operation. >>>>> >>>>> This will be inefficient (in terms of both memory and time) when >>>>> working with very large data sets. But I think its important to have these >>>>> features included. Otherwise a user would be left with very limited set of >>>>> operations. >>>>> >>>>> Please let me know if you have a different opinion on this. >>>>> >>>>> Thanks, >>>>> Danula >>>>> >>>>> On Tue, Jun 16, 2015 at 9:44 AM, Supun Sethunga <sup...@wso2.com> >>>>> wrote: >>>>> >>>>>> Somehow there are issues in implementing certain wrangler functions >>>>>>> due to limitations in JavaRDD used in spark >>>>>>> e.g. - >>>>>>> Fill operation - when filling with values from rows above and below >>>>>>> Fold operation >>>>>> >>>>>> >>>>>> Agree, since rows will get executed randomly with spark, inter-row >>>>>> operations are not very meaningful. >>>>>> But you can slightly modify the implementation of the "Fill" >>>>>> operation, such as, to fill values based on an >>>>>> expression/static-value/mean >>>>>> etc. (not depending on other rows).. >>>>>> >>>>>> Thanks, >>>>>> Supun >>>>>> >>>>>> On Tue, Jun 16, 2015 at 9:27 AM, Supun Sethunga <sup...@wso2.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Danula, >>>>>>> >>>>>>> Sorry for the late reply. Have you got the details you were looking >>>>>>> for? >>>>>>> >>>>>>> It would be great if I could get to know which wrangler operations >>>>>>>> are important for a user of the ML >>>>>>> >>>>>>> >>>>>>> Other than the ones you have mentioned in the proposal, think its >>>>>>> better to have "Translate" operation as well (to create a new >>>>>>> column based on an existing column). >>>>>>> >>>>>>> Thanks, >>>>>>> Supun >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jun 4, 2015 at 10:11 PM, Danula Eranjith < >>>>>>> hmdanu...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I am currently working on generating spark transformations related >>>>>>>> to the operations available in the data wrangler. >>>>>>>> >>>>>>>> Data wrangler provides sufficient parameters to re-create these at >>>>>>>> spark.I have successfully implemented delete and split operations of >>>>>>>> wrangler in spark. >>>>>>>> >>>>>>>> Once this phase is completed, I can either directly generate these >>>>>>>> scripts at wrangler or use the javascript output and convert it to >>>>>>>> spark >>>>>>>> depending on the implementation. >>>>>>>> >>>>>>>> Somehow there are issues in implementing certain wrangler functions >>>>>>>> due to limitations in JavaRDD used in spark >>>>>>>> >>>>>>>> e.g. - >>>>>>>> Fill operation - when filling with values from rows above and below >>>>>>>> Fold operation >>>>>>>> >>>>>>>> It would be great if I could get to know which wrangler operations >>>>>>>> are important for a user of the ML >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Danula >>>>>>>> >>>>>>>> On Wed, Jun 3, 2015 at 8:30 AM, Nirmal Fernando <nir...@wso2.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi Danula, >>>>>>>>> >>>>>>>>> Please send an update of your work thus far. >>>>>>>>> >>>>>>>>> On Sun, May 10, 2015 at 2:30 PM, Nirmal Fernando <nir...@wso2.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Danula, >>>>>>>>>> >>>>>>>>>> Welcome to GSoC 15' ! Can you do some research on directly >>>>>>>>>> generating spark transformations using Wrangler and come up with a >>>>>>>>>> summary ? >>>>>>>>>> >>>>>>>>>> On Fri, May 8, 2015 at 11:03 AM, Danula Eranjith < >>>>>>>>>> hmdanu...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> Thank you for selecting my proposal [1] >>>>>>>>>>> <https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing> >>>>>>>>>>> for GSoC 2015. I am really looking forward to work with you all and >>>>>>>>>>> contribute to WSO2. >>>>>>>>>>> >>>>>>>>>>> I have already completed my primary research on wrangler and >>>>>>>>>>> would like to meet you to get feedback on the proposed >>>>>>>>>>> architecture. I am >>>>>>>>>>> planning to start working on the project before 25th of May. >>>>>>>>>>> >>>>>>>>>>> Thank you, >>>>>>>>>>> Danula >>>>>>>>>>> >>>>>>>>>>> [1] - >>>>>>>>>>> https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Thanks & regards, >>>>>>>>>> Nirmal >>>>>>>>>> >>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>>>>> Mobile: +94715779733 >>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Thanks & regards, >>>>>>>>> Nirmal >>>>>>>>> >>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>>>> Mobile: +94715779733 >>>>>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Supun Sethunga* >>>>>>> Software Engineer >>>>>>> WSO2, Inc. >>>>>>> http://wso2.com/ >>>>>>> lean | enterprise | middleware >>>>>>> Mobile : +94 716546324 >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> *Supun Sethunga* >>>>>> Software Engineer >>>>>> WSO2, Inc. >>>>>> http://wso2.com/ >>>>>> lean | enterprise | middleware >>>>>> Mobile : +94 716546324 >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> Thanks & regards, >>>> Nirmal >>>> >>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>> Mobile: +94715779733 >>>> Blog: http://nirmalfdo.blogspot.com/ >>>> >>>> >>>> >>> >>> >>> -- >>> *Supun Sethunga* >>> Software Engineer >>> WSO2, Inc. >>> http://wso2.com/ >>> lean | enterprise | middleware >>> Mobile : +94 716546324 >>> >> >> > -- *Supun Sethunga* Software Engineer WSO2, Inc. http://wso2.com/ lean | enterprise | middleware Mobile : +94 716546324
_______________________________________________ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev