Hi Danula,

Any update on the progress? Were you managed to integrate the
transformations with the wrangler?

Thanks,

On Thu, Jul 2, 2015 at 11:38 AM, Danula Eranjith <hmdanu...@gmail.com>
wrote:

> Hi all,
>
> Update on the current progress of the project and future activities as we
> discussed at the recent meeting.
>
> *Current Progress*
>
> I have completed the phase of creating spark transformations relevant to
> operations available in wrangler.
>
> Operations implemented
> - Fill
> - Split
> - Drop
> - Delete
> - Extract
>
> *Future activities*
>
> - Modify the wrangler interface to suit the current implementation
> - Automate the process of generating Spark transformations
> - Integrating wrangler to the ML workflow
>
> Thanks,
> Danula
>
> On Sun, Jun 28, 2015 at 9:31 AM, Danula Eranjith <hmdanu...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> No, We haven't done a review yet.
>> It would be great if we could have one so that I can discuss with you all
>> and clarify the next steps of the implementation as you mentioned.
>>
>> Thanks
>> Danula
>>
>> On Sun, Jun 28, 2015 at 9:25 AM, Supun Sethunga <sup...@wso2.com> wrote:
>>
>>> Hi Danula,
>>>
>>> Did we have a review for the work done so far? If not, shall we have a
>>> one? We can clear out any doubts and issues as well..
>>>
>>> Thanks,
>>> Supun
>>>
>>> On Wed, Jun 24, 2015 at 6:42 AM, Nirmal Fernando <nir...@wso2.com>
>>> wrote:
>>>
>>>> Hi Danula,
>>>>
>>>> Thanks for the update, keep them coming.
>>>>
>>>> On a JavaRDD you can perform a collect() to get a list, AFAIR. Yes,
>>>> this is costly, since it would load whole dataset into memory. So, is this
>>>> an operation which involves multiple rows?
>>>>
>>>> On Tue, Jun 23, 2015 at 2:15 PM, Danula Eranjith <hmdanu...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Supun,
>>>>>
>>>>> I modified the "Fill" operation to add what you mentioned.
>>>>>
>>>>> I used a workaround to to implement certain parts of the operations
>>>>> such as filling with values from rows above and below.
>>>>> I created a List Implementation using toArray() method in JavaRDD and
>>>>> then converted it back to a JavaRDD after the operation.
>>>>>
>>>>> This will be inefficient (in terms of both memory and time) when
>>>>> working with very large data sets. But I think its important to have these
>>>>> features included. Otherwise a user would be left with very limited set of
>>>>> operations.
>>>>>
>>>>> Please let me know if you have a different opinion on this.
>>>>>
>>>>> Thanks,
>>>>> Danula
>>>>>
>>>>> On Tue, Jun 16, 2015 at 9:44 AM, Supun Sethunga <sup...@wso2.com>
>>>>> wrote:
>>>>>
>>>>>> Somehow there are issues in implementing certain wrangler functions
>>>>>>> due to limitations in JavaRDD used in spark
>>>>>>> e.g. -
>>>>>>> Fill operation - when filling with values from rows above and below
>>>>>>> Fold operation
>>>>>>
>>>>>>
>>>>>> Agree, since rows will get executed randomly with spark, inter-row
>>>>>> operations are not very meaningful.
>>>>>> But you can slightly modify the implementation of the "Fill"
>>>>>> operation, such as, to fill values based on an 
>>>>>> expression/static-value/mean
>>>>>> etc. (not depending on other rows)..
>>>>>>
>>>>>> Thanks,
>>>>>> Supun
>>>>>>
>>>>>> On Tue, Jun 16, 2015 at 9:27 AM, Supun Sethunga <sup...@wso2.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Danula,
>>>>>>>
>>>>>>> Sorry for the late reply. Have you got the details you were looking
>>>>>>> for?
>>>>>>>
>>>>>>> It would be great if I could get to know which wrangler operations
>>>>>>>> are important for a user of the ML
>>>>>>>
>>>>>>>
>>>>>>> Other than the ones you have mentioned in the proposal, think its
>>>>>>> better to have "Translate" operation as well (to create a new
>>>>>>> column based on an existing column).
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Supun
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jun 4, 2015 at 10:11 PM, Danula Eranjith <
>>>>>>> hmdanu...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I am currently working on generating spark transformations related
>>>>>>>> to the operations available in the data wrangler.
>>>>>>>>
>>>>>>>> Data wrangler provides sufficient parameters to re-create these at
>>>>>>>> spark.I have successfully implemented delete and split operations of
>>>>>>>> wrangler in spark.
>>>>>>>>
>>>>>>>> Once this phase is completed, I can either directly generate these
>>>>>>>> scripts at wrangler or use the javascript output and convert it to 
>>>>>>>> spark
>>>>>>>> depending on the implementation.
>>>>>>>>
>>>>>>>> Somehow there are issues in implementing certain wrangler functions
>>>>>>>> due to limitations in JavaRDD used in spark
>>>>>>>>
>>>>>>>> e.g. -
>>>>>>>> Fill operation - when filling with values from rows above and below
>>>>>>>> Fold operation
>>>>>>>>
>>>>>>>> It would be great if I could get to know which wrangler operations
>>>>>>>> are important for a user of the ML
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Danula
>>>>>>>>
>>>>>>>> On Wed, Jun 3, 2015 at 8:30 AM, Nirmal Fernando <nir...@wso2.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Danula,
>>>>>>>>>
>>>>>>>>> Please send an update of your work thus far.
>>>>>>>>>
>>>>>>>>> On Sun, May 10, 2015 at 2:30 PM, Nirmal Fernando <nir...@wso2.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Danula,
>>>>>>>>>>
>>>>>>>>>> Welcome to GSoC 15' ! Can you do some research on directly
>>>>>>>>>> generating spark transformations using Wrangler and come up with a 
>>>>>>>>>> summary ?
>>>>>>>>>>
>>>>>>>>>> On Fri, May 8, 2015 at 11:03 AM, Danula Eranjith <
>>>>>>>>>> hmdanu...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Thank you for selecting my proposal [1]
>>>>>>>>>>> <https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing>
>>>>>>>>>>> for GSoC 2015. I am really looking forward to work with you all and
>>>>>>>>>>> contribute to WSO2.
>>>>>>>>>>>
>>>>>>>>>>> I have already completed my primary research on wrangler and
>>>>>>>>>>> would like to meet you to get feedback on the proposed 
>>>>>>>>>>> architecture. I am
>>>>>>>>>>> planning to start working on the project before 25th of May.
>>>>>>>>>>>
>>>>>>>>>>> Thank you,
>>>>>>>>>>> Danula
>>>>>>>>>>>
>>>>>>>>>>> [1] -
>>>>>>>>>>> https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Thanks & regards,
>>>>>>>>>> Nirmal
>>>>>>>>>>
>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Thanks & regards,
>>>>>>>>> Nirmal
>>>>>>>>>
>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>> Mobile: +94715779733
>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Supun Sethunga*
>>>>>>> Software Engineer
>>>>>>> WSO2, Inc.
>>>>>>> http://wso2.com/
>>>>>>> lean | enterprise | middleware
>>>>>>> Mobile : +94 716546324
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Supun Sethunga*
>>>>>> Software Engineer
>>>>>> WSO2, Inc.
>>>>>> http://wso2.com/
>>>>>> lean | enterprise | middleware
>>>>>> Mobile : +94 716546324
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Thanks & regards,
>>>> Nirmal
>>>>
>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>> Mobile: +94715779733
>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> *Supun Sethunga*
>>> Software Engineer
>>> WSO2, Inc.
>>> http://wso2.com/
>>> lean | enterprise | middleware
>>> Mobile : +94 716546324
>>>
>>
>>
>


-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324
_______________________________________________
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to