Hi Nirmal, Would it be possible to get some sample data sets which are more likely to be pre-processed using wrangler. I am currently testing my implementations against small and more general data sets.
I have checked datasets available at [1] <https://github.com/wso2/product-ml/tree/master/modules/samples> as well. But there is nothing much to be processed as they are ready to be fed to ML. [1] - https://github.com/wso2/product-ml/tree/master/modules/samples Thanks, Danula On Thu, Jul 16, 2015 at 10:15 PM, Nirmal Fernando <nir...@wso2.com> wrote: > Thanks Danula. > > On Thu, Jul 16, 2015 at 10:07 PM, Danula Eranjith <hmdanu...@gmail.com> > wrote: > >> Hi all, >> >> Sorry for not keeping you in the loop. >> >> After considering and experimenting with several options. I am using the >> javascript code generated by wrangler to implement them using spark. I have >> used regular expressions to extract the operations, parameters and values >> and mapped them to spark transformations I previously developed. >> >> The code generated by wrangler for certain functions have nested >> operations. >> >> (1) >> >> /* Fill split3 with values from above */ >> w.add(dw.fill().column(["split3"]) >> .table(0) >> .status("active") >> .drop(false) >> .direction("down") >> .method("copy") >> .row(undefined) >> ) >> >> (2) >> >> /* Delete rows where split1 is null */ >> w.add(dw.filter().column([]) >> .table(0) >> .status("active") >> .drop(false) >> .row(dw.row().column([]) >> .table(0) >> .status("active") >> .drop(false) >> .conditions([dw.is_null().column([]) >> .table(0) >> .status("active") >> .drop(false) >> .lcol("split1") >> .value(undefined) >> .op_str("is null") >> ]) >> ) >> ) >> >> I have succeeded in parsing the operations similar to (1) above and >> currently working on extending it to work on operations similar to (2). >> >> Next step would be automating the process of spark transformation >> generation. >> >> Thanks, >> Danula >> >> On Wed, Jul 15, 2015 at 7:32 PM, Nirmal Fernando <nir...@wso2.com> wrote: >> >>> Hi Danula, >>> >>> Please send an update at least every week. >>> >>> On Wed, Jul 15, 2015 at 5:51 PM, Supun Sethunga <sup...@wso2.com> wrote: >>> >>>> Hi Danula, >>>> >>>> Any update on the progress? Were you managed to integrate the >>>> transformations with the wrangler? >>>> >>>> Thanks, >>>> >>>> On Thu, Jul 2, 2015 at 11:38 AM, Danula Eranjith <hmdanu...@gmail.com> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> Update on the current progress of the project and future activities as >>>>> we discussed at the recent meeting. >>>>> >>>>> *Current Progress* >>>>> >>>>> I have completed the phase of creating spark transformations relevant >>>>> to operations available in wrangler. >>>>> >>>>> Operations implemented >>>>> - Fill >>>>> - Split >>>>> - Drop >>>>> - Delete >>>>> - Extract >>>>> >>>>> *Future activities* >>>>> >>>>> - Modify the wrangler interface to suit the current implementation >>>>> - Automate the process of generating Spark transformations >>>>> - Integrating wrangler to the ML workflow >>>>> >>>>> Thanks, >>>>> Danula >>>>> >>>>> On Sun, Jun 28, 2015 at 9:31 AM, Danula Eranjith <hmdanu...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> No, We haven't done a review yet. >>>>>> It would be great if we could have one so that I can discuss with you >>>>>> all and clarify the next steps of the implementation as you mentioned. >>>>>> >>>>>> Thanks >>>>>> Danula >>>>>> >>>>>> On Sun, Jun 28, 2015 at 9:25 AM, Supun Sethunga <sup...@wso2.com> >>>>>> wrote: >>>>>> >>>>>>> Hi Danula, >>>>>>> >>>>>>> Did we have a review for the work done so far? If not, shall we have >>>>>>> a one? We can clear out any doubts and issues as well.. >>>>>>> >>>>>>> Thanks, >>>>>>> Supun >>>>>>> >>>>>>> On Wed, Jun 24, 2015 at 6:42 AM, Nirmal Fernando <nir...@wso2.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Danula, >>>>>>>> >>>>>>>> Thanks for the update, keep them coming. >>>>>>>> >>>>>>>> On a JavaRDD you can perform a collect() to get a list, AFAIR. Yes, >>>>>>>> this is costly, since it would load whole dataset into memory. So, is >>>>>>>> this >>>>>>>> an operation which involves multiple rows? >>>>>>>> >>>>>>>> On Tue, Jun 23, 2015 at 2:15 PM, Danula Eranjith < >>>>>>>> hmdanu...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Supun, >>>>>>>>> >>>>>>>>> I modified the "Fill" operation to add what you mentioned. >>>>>>>>> >>>>>>>>> I used a workaround to to implement certain parts of the >>>>>>>>> operations such as filling with values from rows above and below. >>>>>>>>> I created a List Implementation using toArray() method in JavaRDD >>>>>>>>> and then converted it back to a JavaRDD after the operation. >>>>>>>>> >>>>>>>>> This will be inefficient (in terms of both memory and time) when >>>>>>>>> working with very large data sets. But I think its important to have >>>>>>>>> these >>>>>>>>> features included. Otherwise a user would be left with very limited >>>>>>>>> set of >>>>>>>>> operations. >>>>>>>>> >>>>>>>>> Please let me know if you have a different opinion on this. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Danula >>>>>>>>> >>>>>>>>> On Tue, Jun 16, 2015 at 9:44 AM, Supun Sethunga <sup...@wso2.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Somehow there are issues in implementing certain wrangler >>>>>>>>>>> functions due to limitations in JavaRDD used in spark >>>>>>>>>>> e.g. - >>>>>>>>>>> Fill operation - when filling with values from rows above and >>>>>>>>>>> below >>>>>>>>>>> Fold operation >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Agree, since rows will get executed randomly with spark, >>>>>>>>>> inter-row operations are not very meaningful. >>>>>>>>>> But you can slightly modify the implementation of the "Fill" >>>>>>>>>> operation, such as, to fill values based on an >>>>>>>>>> expression/static-value/mean >>>>>>>>>> etc. (not depending on other rows).. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Supun >>>>>>>>>> >>>>>>>>>> On Tue, Jun 16, 2015 at 9:27 AM, Supun Sethunga <sup...@wso2.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Danula, >>>>>>>>>>> >>>>>>>>>>> Sorry for the late reply. Have you got the details you were >>>>>>>>>>> looking for? >>>>>>>>>>> >>>>>>>>>>> It would be great if I could get to know which wrangler >>>>>>>>>>>> operations are important for a user of the ML >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Other than the ones you have mentioned in the proposal, think >>>>>>>>>>> its better to have "Translate" operation as well (to create a >>>>>>>>>>> new column based on an existing column). >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Supun >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Jun 4, 2015 at 10:11 PM, Danula Eranjith < >>>>>>>>>>> hmdanu...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> I am currently working on generating spark transformations >>>>>>>>>>>> related to the operations available in the data wrangler. >>>>>>>>>>>> >>>>>>>>>>>> Data wrangler provides sufficient parameters to re-create these >>>>>>>>>>>> at spark.I have successfully implemented delete and split >>>>>>>>>>>> operations of >>>>>>>>>>>> wrangler in spark. >>>>>>>>>>>> >>>>>>>>>>>> Once this phase is completed, I can either directly generate >>>>>>>>>>>> these scripts at wrangler or use the javascript output and convert >>>>>>>>>>>> it to >>>>>>>>>>>> spark depending on the implementation. >>>>>>>>>>>> >>>>>>>>>>>> Somehow there are issues in implementing certain wrangler >>>>>>>>>>>> functions due to limitations in JavaRDD used in spark >>>>>>>>>>>> >>>>>>>>>>>> e.g. - >>>>>>>>>>>> Fill operation - when filling with values from rows above and >>>>>>>>>>>> below >>>>>>>>>>>> Fold operation >>>>>>>>>>>> >>>>>>>>>>>> It would be great if I could get to know which wrangler >>>>>>>>>>>> operations are important for a user of the ML >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Danula >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Jun 3, 2015 at 8:30 AM, Nirmal Fernando < >>>>>>>>>>>> nir...@wso2.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi Danula, >>>>>>>>>>>>> >>>>>>>>>>>>> Please send an update of your work thus far. >>>>>>>>>>>>> >>>>>>>>>>>>> On Sun, May 10, 2015 at 2:30 PM, Nirmal Fernando < >>>>>>>>>>>>> nir...@wso2.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Danula, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Welcome to GSoC 15' ! Can you do some research on directly >>>>>>>>>>>>>> generating spark transformations using Wrangler and come up with >>>>>>>>>>>>>> a summary ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, May 8, 2015 at 11:03 AM, Danula Eranjith < >>>>>>>>>>>>>> hmdanu...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you for selecting my proposal [1] >>>>>>>>>>>>>>> <https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing> >>>>>>>>>>>>>>> for GSoC 2015. I am really looking forward to work with you all >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> contribute to WSO2. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have already completed my primary research on wrangler and >>>>>>>>>>>>>>> would like to meet you to get feedback on the proposed >>>>>>>>>>>>>>> architecture. I am >>>>>>>>>>>>>>> planning to start working on the project before 25th of May. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you, >>>>>>>>>>>>>>> Danula >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [1] - >>>>>>>>>>>>>>> https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks & regards, >>>>>>>>>>>>>> Nirmal >>>>>>>>>>>>>> >>>>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>>>>>>>>> Mobile: +94715779733 >>>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks & regards, >>>>>>>>>>>>> Nirmal >>>>>>>>>>>>> >>>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>>>>>>>> Mobile: +94715779733 >>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> *Supun Sethunga* >>>>>>>>>>> Software Engineer >>>>>>>>>>> WSO2, Inc. >>>>>>>>>>> http://wso2.com/ >>>>>>>>>>> lean | enterprise | middleware >>>>>>>>>>> Mobile : +94 716546324 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> *Supun Sethunga* >>>>>>>>>> Software Engineer >>>>>>>>>> WSO2, Inc. >>>>>>>>>> http://wso2.com/ >>>>>>>>>> lean | enterprise | middleware >>>>>>>>>> Mobile : +94 716546324 >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Thanks & regards, >>>>>>>> Nirmal >>>>>>>> >>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>>>>>>> Mobile: +94715779733 >>>>>>>> Blog: http://nirmalfdo.blogspot.com/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> *Supun Sethunga* >>>>>>> Software Engineer >>>>>>> WSO2, Inc. >>>>>>> http://wso2.com/ >>>>>>> lean | enterprise | middleware >>>>>>> Mobile : +94 716546324 >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> *Supun Sethunga* >>>> Software Engineer >>>> WSO2, Inc. >>>> http://wso2.com/ >>>> lean | enterprise | middleware >>>> Mobile : +94 716546324 >>>> >>> >>> >>> >>> -- >>> >>> Thanks & regards, >>> Nirmal >>> >>> Associate Technical Lead - Data Technologies Team, WSO2 Inc. >>> Mobile: +94715779733 >>> Blog: http://nirmalfdo.blogspot.com/ >>> >>> >>> >> > > > -- > > Thanks & regards, > Nirmal > > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > >
_______________________________________________ Dev mailing list Dev@wso2.org http://wso2.org/cgi-bin/mailman/listinfo/dev