Hi Nirmal,

Would it be possible to get some sample data sets which are more likely to
be pre-processed using wrangler. I am currently testing my implementations
against small and more general data sets.

I have checked datasets available at [1]
<https://github.com/wso2/product-ml/tree/master/modules/samples> as well.
But there is nothing much to be processed as they are ready to be fed to ML.

[1] - https://github.com/wso2/product-ml/tree/master/modules/samples

Thanks,
Danula

On Thu, Jul 16, 2015 at 10:15 PM, Nirmal Fernando <nir...@wso2.com> wrote:

> Thanks Danula.
>
> On Thu, Jul 16, 2015 at 10:07 PM, Danula Eranjith <hmdanu...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Sorry for not keeping you in the loop.
>>
>> After considering and experimenting with several options. I am using the
>> javascript code generated by wrangler to implement them using spark. I have
>> used regular expressions to extract the operations, parameters and values
>> and mapped them to spark transformations I previously developed.
>>
>> The code generated by wrangler for certain functions have nested
>> operations.
>>
>> (1)
>>
>> /* Fill split3  with values from above */
>> w.add(dw.fill().column(["split3"])
>> .table(0)
>> .status("active")
>> .drop(false)
>> .direction("down")
>> .method("copy")
>> .row(undefined)
>> )
>>
>> (2)
>>
>> /* Delete  rows where split1 is null */
>> w.add(dw.filter().column([])
>> .table(0)
>> .status("active")
>> .drop(false)
>> .row(dw.row().column([])
>> .table(0)
>> .status("active")
>> .drop(false)
>> .conditions([dw.is_null().column([])
>> .table(0)
>> .status("active")
>> .drop(false)
>> .lcol("split1")
>> .value(undefined)
>> .op_str("is null")
>> ])
>> )
>> )
>>
>> I have succeeded in parsing the operations similar to (1) above and
>> currently working on extending it to work on operations similar to (2).
>>
>> Next step would be automating the process of spark transformation
>> generation.
>>
>> Thanks,
>> Danula
>>
>> On Wed, Jul 15, 2015 at 7:32 PM, Nirmal Fernando <nir...@wso2.com> wrote:
>>
>>> Hi Danula,
>>>
>>> Please send an update at least every week.
>>>
>>> On Wed, Jul 15, 2015 at 5:51 PM, Supun Sethunga <sup...@wso2.com> wrote:
>>>
>>>> Hi Danula,
>>>>
>>>> Any update on the progress? Were you managed to integrate the
>>>> transformations with the wrangler?
>>>>
>>>> Thanks,
>>>>
>>>> On Thu, Jul 2, 2015 at 11:38 AM, Danula Eranjith <hmdanu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Update on the current progress of the project and future activities as
>>>>> we discussed at the recent meeting.
>>>>>
>>>>> *Current Progress*
>>>>>
>>>>> I have completed the phase of creating spark transformations relevant
>>>>> to operations available in wrangler.
>>>>>
>>>>> Operations implemented
>>>>> - Fill
>>>>> - Split
>>>>> - Drop
>>>>> - Delete
>>>>> - Extract
>>>>>
>>>>> *Future activities*
>>>>>
>>>>> - Modify the wrangler interface to suit the current implementation
>>>>> - Automate the process of generating Spark transformations
>>>>> - Integrating wrangler to the ML workflow
>>>>>
>>>>> Thanks,
>>>>> Danula
>>>>>
>>>>> On Sun, Jun 28, 2015 at 9:31 AM, Danula Eranjith <hmdanu...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> No, We haven't done a review yet.
>>>>>> It would be great if we could have one so that I can discuss with you
>>>>>> all and clarify the next steps of the implementation as you mentioned.
>>>>>>
>>>>>> Thanks
>>>>>> Danula
>>>>>>
>>>>>> On Sun, Jun 28, 2015 at 9:25 AM, Supun Sethunga <sup...@wso2.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Danula,
>>>>>>>
>>>>>>> Did we have a review for the work done so far? If not, shall we have
>>>>>>> a one? We can clear out any doubts and issues as well..
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Supun
>>>>>>>
>>>>>>> On Wed, Jun 24, 2015 at 6:42 AM, Nirmal Fernando <nir...@wso2.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Danula,
>>>>>>>>
>>>>>>>> Thanks for the update, keep them coming.
>>>>>>>>
>>>>>>>> On a JavaRDD you can perform a collect() to get a list, AFAIR. Yes,
>>>>>>>> this is costly, since it would load whole dataset into memory. So, is 
>>>>>>>> this
>>>>>>>> an operation which involves multiple rows?
>>>>>>>>
>>>>>>>> On Tue, Jun 23, 2015 at 2:15 PM, Danula Eranjith <
>>>>>>>> hmdanu...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Supun,
>>>>>>>>>
>>>>>>>>> I modified the "Fill" operation to add what you mentioned.
>>>>>>>>>
>>>>>>>>> I used a workaround to to implement certain parts of the
>>>>>>>>> operations such as filling with values from rows above and below.
>>>>>>>>> I created a List Implementation using toArray() method in JavaRDD
>>>>>>>>> and then converted it back to a JavaRDD after the operation.
>>>>>>>>>
>>>>>>>>> This will be inefficient (in terms of both memory and time) when
>>>>>>>>> working with very large data sets. But I think its important to have 
>>>>>>>>> these
>>>>>>>>> features included. Otherwise a user would be left with very limited 
>>>>>>>>> set of
>>>>>>>>> operations.
>>>>>>>>>
>>>>>>>>> Please let me know if you have a different opinion on this.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Danula
>>>>>>>>>
>>>>>>>>> On Tue, Jun 16, 2015 at 9:44 AM, Supun Sethunga <sup...@wso2.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Somehow there are issues in implementing certain wrangler
>>>>>>>>>>> functions due to limitations in JavaRDD used in spark
>>>>>>>>>>> e.g. -
>>>>>>>>>>> Fill operation - when filling with values from rows above and
>>>>>>>>>>> below
>>>>>>>>>>> Fold operation
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Agree, since rows will get executed randomly with spark,
>>>>>>>>>> inter-row operations are not very meaningful.
>>>>>>>>>> But you can slightly modify the implementation of the "Fill"
>>>>>>>>>> operation, such as, to fill values based on an 
>>>>>>>>>> expression/static-value/mean
>>>>>>>>>> etc. (not depending on other rows)..
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Supun
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 16, 2015 at 9:27 AM, Supun Sethunga <sup...@wso2.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>
>>>>>>>>>>> Sorry for the late reply. Have you got the details you were
>>>>>>>>>>> looking for?
>>>>>>>>>>>
>>>>>>>>>>> It would be great if I could get to know which wrangler
>>>>>>>>>>>> operations are important for a user of the ML
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Other than the ones you have mentioned in the proposal, think
>>>>>>>>>>> its better to have "Translate" operation as well (to create a
>>>>>>>>>>> new column based on an existing column).
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Supun
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jun 4, 2015 at 10:11 PM, Danula Eranjith <
>>>>>>>>>>> hmdanu...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I am currently working on generating spark transformations
>>>>>>>>>>>> related to the operations available in the data wrangler.
>>>>>>>>>>>>
>>>>>>>>>>>> Data wrangler provides sufficient parameters to re-create these
>>>>>>>>>>>> at spark.I have successfully implemented delete and split 
>>>>>>>>>>>> operations of
>>>>>>>>>>>> wrangler in spark.
>>>>>>>>>>>>
>>>>>>>>>>>> Once this phase is completed, I can either directly generate
>>>>>>>>>>>> these scripts at wrangler or use the javascript output and convert 
>>>>>>>>>>>> it to
>>>>>>>>>>>> spark depending on the implementation.
>>>>>>>>>>>>
>>>>>>>>>>>> Somehow there are issues in implementing certain wrangler
>>>>>>>>>>>> functions due to limitations in JavaRDD used in spark
>>>>>>>>>>>>
>>>>>>>>>>>> e.g. -
>>>>>>>>>>>> Fill operation - when filling with values from rows above and
>>>>>>>>>>>> below
>>>>>>>>>>>> Fold operation
>>>>>>>>>>>>
>>>>>>>>>>>> It would be great if I could get to know which wrangler
>>>>>>>>>>>> operations are important for a user of the ML
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Danula
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 3, 2015 at 8:30 AM, Nirmal Fernando <
>>>>>>>>>>>> nir...@wso2.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Please send an update of your work thus far.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, May 10, 2015 at 2:30 PM, Nirmal Fernando <
>>>>>>>>>>>>> nir...@wso2.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Welcome to GSoC 15' ! Can you do some research on directly
>>>>>>>>>>>>>> generating spark transformations using Wrangler and come up with 
>>>>>>>>>>>>>> a summary ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, May 8, 2015 at 11:03 AM, Danula Eranjith <
>>>>>>>>>>>>>> hmdanu...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you for selecting my proposal [1]
>>>>>>>>>>>>>>> <https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing>
>>>>>>>>>>>>>>> for GSoC 2015. I am really looking forward to work with you all 
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> contribute to WSO2.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have already completed my primary research on wrangler and
>>>>>>>>>>>>>>> would like to meet you to get feedback on the proposed 
>>>>>>>>>>>>>>> architecture. I am
>>>>>>>>>>>>>>> planning to start working on the project before 25th of May.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>>> Danula
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>>>> https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>>>>> Nirmal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>>>> Nirmal
>>>>>>>>>>>>>
>>>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>> Software Engineer
>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>> Software Engineer
>>>>>>>>>> WSO2, Inc.
>>>>>>>>>> http://wso2.com/
>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Thanks & regards,
>>>>>>>> Nirmal
>>>>>>>>
>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>> Mobile: +94715779733
>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Supun Sethunga*
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc.
>>>>>>> http://wso2.com/
>>>>>>> lean | enterprise | middleware
>>>>>>> Mobile : +94 716546324
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> *Supun Sethunga*
>>>> Software Engineer
>>>> WSO2, Inc.
>>>> http://wso2.com/
>>>> lean | enterprise | middleware
>>>> Mobile : +94 716546324
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thanks & regards,
>>> Nirmal
>>>
>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>> Mobile: +94715779733
>>> Blog: http://nirmalfdo.blogspot.com/
>>>
>>>
>>>
>>
>
>
> --
>
> Thanks & regards,
> Nirmal
>
> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
> Mobile: +94715779733
> Blog: http://nirmalfdo.blogspot.com/
>
>
>
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to