Any update?

On Fri, Aug 7, 2015 at 10:13 AM, Supun Sethunga <[email protected]> wrote:

> Hi Danula,
>
> Sorry I couldn't join the meeting. Can you please share the meeting/review
> notes? Also the progress on the suggestions and what is left to be done in
> overall?
>
> Thanks,
> Supun
>
> On Wed, Aug 5, 2015 at 3:47 AM, Nirmal Fernando <[email protected]> wrote:
>
>> Hi Danula,
>>
>> It should be a JavaRDD<String[]>, where each row represents the feature
>> vector as a string[].
>>
>> On Tue, Aug 4, 2015 at 11:51 AM, Danula Eranjith <[email protected]>
>> wrote:
>>
>>> In other words,
>>> What would be the preferred output type for a dataset which is
>>> pre-processed by wrangler?
>>> As I have observed different algorithms use different JavaRDD types as
>>> input ( JavaRDD<String>, JavaRDD<Vector> etc )
>>>
>>> On Tue, Aug 4, 2015 at 11:48 AM, Nirmal Fernando <[email protected]>
>>> wrote:
>>>
>>>> Hi Danula,
>>>>
>>>> On Tue, Aug 4, 2015 at 11:47 AM, Danula Eranjith <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Nirmal,
>>>>>
>>>>> In ML, what is the preferred way of keeping data in a single row of
>>>>> JavaRDD?
>>>>>
>>>>
>>>> I didn't quite get your question. Can you elaborate please?
>>>>
>>>>
>>>>>
>>>>> As I have figured it depends on the algorithm being used.
>>>>>
>>>>> Danula
>>>>>
>>>>> On Thu, Jul 30, 2015 at 9:14 AM, Nirmal Fernando <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks Danula, I'll send an invite.
>>>>>>
>>>>>> On Wed, Jul 29, 2015 at 10:24 PM, Danula Eranjith <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Nirmal,
>>>>>>>
>>>>>>> I am available after 1.30pm on Tuesday, Wednesday and Thursday.
>>>>>>>
>>>>>>> Danula
>>>>>>>
>>>>>>> On Wed, Jul 29, 2015 at 12:10 PM, Nirmal Fernando <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Danula,
>>>>>>>>
>>>>>>>> Can we arrange a demo/review somewhere next week? Please let me
>>>>>>>> know few time slots.
>>>>>>>>
>>>>>>>> On Thu, Jul 23, 2015 at 11:47 AM, Nirmal Fernando <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Danula.
>>>>>>>>>
>>>>>>>>> On Thu, Jul 23, 2015 at 11:41 AM, Danula Eranjith <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> You can find the source at [1]
>>>>>>>>>> <https://github.com/danula/wso2-ml-wrangler-integration>. I have
>>>>>>>>>> to do some refactoring when integrating to ML.
>>>>>>>>>>
>>>>>>>>>> [1] - https://github.com/danula/wso2-ml-wrangler-integration
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 23, 2015 at 11:31 AM, Nirmal Fernando <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Danula. Please share the current code, if possible.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jul 23, 2015 at 8:41 AM, Danula Eranjith <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I have succeeded in parsing the operations from wrangler
>>>>>>>>>>>> javascript code to spark transformations I have written. Working on
>>>>>>>>>>>> automating the process.
>>>>>>>>>>>>
>>>>>>>>>>>> Last couple of steps would be changing the wrangler interface
>>>>>>>>>>>> and integrating it into ML Wizard.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks
>>>>>>>>>>>> Danula
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jul 22, 2015 at 9:31 AM, Nirmal Fernando <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Could you please summarize the current status of the project
>>>>>>>>>>>>> and also the things left to do?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Jul 19, 2015 at 11:39 PM, Danula Eranjith <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>> Will use them. I already have some other kaggle datasets as
>>>>>>>>>>>>>> well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    1.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sun, Jul 19, 2015 at 11:30 PM, Danula Eranjith <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Nirmal,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Would it be possible to get some sample data sets which are
>>>>>>>>>>>>>>>> more likely to be pre-processed using wrangler. I am currently 
>>>>>>>>>>>>>>>> testing my
>>>>>>>>>>>>>>>> implementations against small and more general data sets.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have checked datasets available at [1]
>>>>>>>>>>>>>>>> <https://github.com/wso2/product-ml/tree/master/modules/samples>
>>>>>>>>>>>>>>>>  as
>>>>>>>>>>>>>>>> well. But there is nothing much to be processed as they are 
>>>>>>>>>>>>>>>> ready to be fed
>>>>>>>>>>>>>>>> to ML.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>>>>> https://github.com/wso2/product-ml/tree/master/modules/samples
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Danula
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jul 16, 2015 at 10:15 PM, Nirmal Fernando <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks Danula.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jul 16, 2015 at 10:07 PM, Danula Eranjith <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sorry for not keeping you in the loop.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> After considering and experimenting with several options.
>>>>>>>>>>>>>>>>>> I am using the javascript code generated by wrangler to 
>>>>>>>>>>>>>>>>>> implement them
>>>>>>>>>>>>>>>>>> using spark. I have used regular expressions to extract the 
>>>>>>>>>>>>>>>>>> operations,
>>>>>>>>>>>>>>>>>> parameters and values and mapped them to spark 
>>>>>>>>>>>>>>>>>> transformations I previously
>>>>>>>>>>>>>>>>>> developed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The code generated by wrangler for certain functions have
>>>>>>>>>>>>>>>>>> nested operations.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> (1)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> /* Fill split3  with values from above */
>>>>>>>>>>>>>>>>>> w.add(dw.fill().column(["split3"])
>>>>>>>>>>>>>>>>>> .table(0)
>>>>>>>>>>>>>>>>>> .status("active")
>>>>>>>>>>>>>>>>>> .drop(false)
>>>>>>>>>>>>>>>>>> .direction("down")
>>>>>>>>>>>>>>>>>> .method("copy")
>>>>>>>>>>>>>>>>>> .row(undefined)
>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> (2)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> /* Delete  rows where split1 is null */
>>>>>>>>>>>>>>>>>> w.add(dw.filter().column([])
>>>>>>>>>>>>>>>>>> .table(0)
>>>>>>>>>>>>>>>>>> .status("active")
>>>>>>>>>>>>>>>>>> .drop(false)
>>>>>>>>>>>>>>>>>> .row(dw.row().column([])
>>>>>>>>>>>>>>>>>> .table(0)
>>>>>>>>>>>>>>>>>> .status("active")
>>>>>>>>>>>>>>>>>> .drop(false)
>>>>>>>>>>>>>>>>>> .conditions([dw.is_null().column([])
>>>>>>>>>>>>>>>>>> .table(0)
>>>>>>>>>>>>>>>>>> .status("active")
>>>>>>>>>>>>>>>>>> .drop(false)
>>>>>>>>>>>>>>>>>> .lcol("split1")
>>>>>>>>>>>>>>>>>> .value(undefined)
>>>>>>>>>>>>>>>>>> .op_str("is null")
>>>>>>>>>>>>>>>>>> ])
>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>> )
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have succeeded in parsing the operations similar to (1)
>>>>>>>>>>>>>>>>>> above and currently working on extending it to work on 
>>>>>>>>>>>>>>>>>> operations similar
>>>>>>>>>>>>>>>>>> to (2).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Next step would be automating the process of spark
>>>>>>>>>>>>>>>>>> transformation generation.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Danula
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, Jul 15, 2015 at 7:32 PM, Nirmal Fernando <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Please send an update at least every week.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Wed, Jul 15, 2015 at 5:51 PM, Supun Sethunga <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Any update on the progress? Were you managed to
>>>>>>>>>>>>>>>>>>>> integrate the transformations with the wrangler?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Jul 2, 2015 at 11:38 AM, Danula Eranjith <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Update on the current progress of the project and
>>>>>>>>>>>>>>>>>>>>> future activities as we discussed at the recent meeting.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> *Current Progress*
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I have completed the phase of creating spark
>>>>>>>>>>>>>>>>>>>>> transformations relevant to operations available in 
>>>>>>>>>>>>>>>>>>>>> wrangler.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Operations implemented
>>>>>>>>>>>>>>>>>>>>> - Fill
>>>>>>>>>>>>>>>>>>>>> - Split
>>>>>>>>>>>>>>>>>>>>> - Drop
>>>>>>>>>>>>>>>>>>>>> - Delete
>>>>>>>>>>>>>>>>>>>>> - Extract
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> *Future activities*
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> - Modify the wrangler interface to suit the current
>>>>>>>>>>>>>>>>>>>>> implementation
>>>>>>>>>>>>>>>>>>>>> - Automate the process of generating Spark
>>>>>>>>>>>>>>>>>>>>> transformations
>>>>>>>>>>>>>>>>>>>>> - Integrating wrangler to the ML workflow
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Danula
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Sun, Jun 28, 2015 at 9:31 AM, Danula Eranjith <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> No, We haven't done a review yet.
>>>>>>>>>>>>>>>>>>>>>> It would be great if we could have one so that I can
>>>>>>>>>>>>>>>>>>>>>> discuss with you all and clarify the next steps of the 
>>>>>>>>>>>>>>>>>>>>>> implementation as
>>>>>>>>>>>>>>>>>>>>>> you mentioned.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>>>>>> Danula
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Sun, Jun 28, 2015 at 9:25 AM, Supun Sethunga <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Did we have a review for the work done so far? If
>>>>>>>>>>>>>>>>>>>>>>> not, shall we have a one? We can clear out any doubts 
>>>>>>>>>>>>>>>>>>>>>>> and issues as well..
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> Supun
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 24, 2015 at 6:42 AM, Nirmal Fernando <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the update, keep them coming.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On a JavaRDD you can perform a collect() to get a
>>>>>>>>>>>>>>>>>>>>>>>> list, AFAIR. Yes, this is costly, since it would load 
>>>>>>>>>>>>>>>>>>>>>>>> whole dataset into
>>>>>>>>>>>>>>>>>>>>>>>> memory. So, is this an operation which involves 
>>>>>>>>>>>>>>>>>>>>>>>> multiple rows?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 23, 2015 at 2:15 PM, Danula Eranjith <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hi Supun,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I modified the "Fill" operation to add what you
>>>>>>>>>>>>>>>>>>>>>>>>> mentioned.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I used a workaround to to implement certain parts
>>>>>>>>>>>>>>>>>>>>>>>>> of the operations such as filling with values from 
>>>>>>>>>>>>>>>>>>>>>>>>> rows above and below.
>>>>>>>>>>>>>>>>>>>>>>>>> I created a List Implementation using toArray()
>>>>>>>>>>>>>>>>>>>>>>>>> method in JavaRDD and then converted it back to a 
>>>>>>>>>>>>>>>>>>>>>>>>> JavaRDD after the
>>>>>>>>>>>>>>>>>>>>>>>>> operation.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> This will be inefficient (in terms of both memory
>>>>>>>>>>>>>>>>>>>>>>>>> and time) when working with very large data sets. But 
>>>>>>>>>>>>>>>>>>>>>>>>> I think its important
>>>>>>>>>>>>>>>>>>>>>>>>> to have these features included. Otherwise a user 
>>>>>>>>>>>>>>>>>>>>>>>>> would be left with very
>>>>>>>>>>>>>>>>>>>>>>>>> limited set of operations.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if you have a different opinion
>>>>>>>>>>>>>>>>>>>>>>>>> on this.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>> Danula
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 16, 2015 at 9:44 AM, Supun Sethunga <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Somehow there are issues in implementing certain
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrangler functions due to limitations in JavaRDD 
>>>>>>>>>>>>>>>>>>>>>>>>>>> used in spark
>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g. -
>>>>>>>>>>>>>>>>>>>>>>>>>>> Fill operation - when filling with values from
>>>>>>>>>>>>>>>>>>>>>>>>>>> rows above and below
>>>>>>>>>>>>>>>>>>>>>>>>>>> Fold operation
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Agree, since rows will get executed randomly with
>>>>>>>>>>>>>>>>>>>>>>>>>> spark, inter-row operations are not very meaningful.
>>>>>>>>>>>>>>>>>>>>>>>>>> But you can slightly modify the implementation of
>>>>>>>>>>>>>>>>>>>>>>>>>> the "Fill" operation, such as, to fill values based 
>>>>>>>>>>>>>>>>>>>>>>>>>> on an
>>>>>>>>>>>>>>>>>>>>>>>>>> expression/static-value/mean etc. (not depending on 
>>>>>>>>>>>>>>>>>>>>>>>>>> other rows)..
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> Supun
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Jun 16, 2015 at 9:27 AM, Supun Sethunga <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for the late reply. Have you got the
>>>>>>>>>>>>>>>>>>>>>>>>>>> details you were looking for?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> It would be great if I could get to know which
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrangler operations are important for a user of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the ML
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Other than the ones you have mentioned in the
>>>>>>>>>>>>>>>>>>>>>>>>>>> proposal, think its better to have "Translate"
>>>>>>>>>>>>>>>>>>>>>>>>>>> operation as well (to create a new column based on 
>>>>>>>>>>>>>>>>>>>>>>>>>>> an existing column).
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Supun
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Jun 4, 2015 at 10:11 PM, Danula Eranjith
>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am currently working on generating spark
>>>>>>>>>>>>>>>>>>>>>>>>>>>> transformations related to the operations 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> available in the data wrangler.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Data wrangler provides sufficient parameters to
>>>>>>>>>>>>>>>>>>>>>>>>>>>> re-create these at spark.I have successfully 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> implemented delete and split
>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations of wrangler in spark.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Once this phase is completed, I can either
>>>>>>>>>>>>>>>>>>>>>>>>>>>> directly generate these scripts at wrangler or use 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the javascript output
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and convert it to spark depending on the 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> implementation.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Somehow there are issues in implementing
>>>>>>>>>>>>>>>>>>>>>>>>>>>> certain wrangler functions due to limitations in 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> JavaRDD used in spark
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> e.g. -
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fill operation - when filling with values from
>>>>>>>>>>>>>>>>>>>>>>>>>>>> rows above and below
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fold operation
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> It would be great if I could get to know which
>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrangler operations are important for a user of 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the ML
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danula
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 3, 2015 at 8:30 AM, Nirmal Fernando
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please send an update of your work thus far.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, May 10, 2015 at 2:30 PM, Nirmal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Fernando <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Danula,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Welcome to GSoC 15' ! Can you do some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> research on directly generating spark 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> transformations using Wrangler and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> come up with a summary ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, May 8, 2015 at 11:03 AM, Danula
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Eranjith <[email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for selecting my proposal [1]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for GSoC 2015. I am really looking forward to 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> work with you all and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> contribute to WSO2.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have already completed my primary research
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> on wrangler and would like to meet you to get 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> feedback on the proposed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> architecture. I am planning to start working on 
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the project before 25th of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> May.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Danula
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/18NFa23CrhXqnHrkl_AuRz3sQ3Axg7SEmiA7l66Hl9_0/edit?usp=sharing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nirmal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Associate Technical Lead - Data Technologies
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Team, WSO2 Inc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Nirmal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Associate Technical Lead - Data Technologies
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Team, WSO2 Inc.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>>>>>>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>>>>>>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>>>>>>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>>>>>>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>>>>>>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>>>>>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>>>>>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>>>>>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>>>>>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>>>>>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>>>>>>>>>>>>>>> Nirmal
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Associate Technical Lead - Data Technologies Team,
>>>>>>>>>>>>>>>>>>>>>>>> WSO2 Inc.
>>>>>>>>>>>>>>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>>>>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> *Supun Sethunga*
>>>>>>>>>>>>>>>>>>>> Software Engineer
>>>>>>>>>>>>>>>>>>>> WSO2, Inc.
>>>>>>>>>>>>>>>>>>>> http://wso2.com/
>>>>>>>>>>>>>>>>>>>> lean | enterprise | middleware
>>>>>>>>>>>>>>>>>>>> Mobile : +94 716546324
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>>>>>>>>>> Nirmal
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2
>>>>>>>>>>>>>>>>>>> Inc.
>>>>>>>>>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>>>>>>>> Nirmal
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2
>>>>>>>>>>>>>>>>> Inc.
>>>>>>>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>>>>>> Nirmal
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>>>> Nirmal
>>>>>>>>>>>>>
>>>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Thanks & regards,
>>>>>>>>>>> Nirmal
>>>>>>>>>>>
>>>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>>>> Mobile: +94715779733
>>>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Thanks & regards,
>>>>>>>>> Nirmal
>>>>>>>>>
>>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>>> Mobile: +94715779733
>>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Thanks & regards,
>>>>>>>> Nirmal
>>>>>>>>
>>>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>>>> Mobile: +94715779733
>>>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Thanks & regards,
>>>>>> Nirmal
>>>>>>
>>>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>>>> Mobile: +94715779733
>>>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Thanks & regards,
>>>> Nirmal
>>>>
>>>> Team Lead - WSO2 Machine Learner
>>>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>>>> Mobile: +94715779733
>>>> Blog: http://nirmalfdo.blogspot.com/
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Thanks & regards,
>> Nirmal
>>
>> Team Lead - WSO2 Machine Learner
>> Associate Technical Lead - Data Technologies Team, WSO2 Inc.
>> Mobile: +94715779733
>> Blog: http://nirmalfdo.blogspot.com/
>>
>>
>>
>
>
> --
> *Supun Sethunga*
> Software Engineer
> WSO2, Inc.
> http://wso2.com/
> lean | enterprise | middleware
> Mobile : +94 716546324
>



-- 
*Supun Sethunga*
Software Engineer
WSO2, Inc.
http://wso2.com/
lean | enterprise | middleware
Mobile : +94 716546324
_______________________________________________
Dev mailing list
[email protected]
http://wso2.org/cgi-bin/mailman/listinfo/dev

Reply via email to