Hi Imesh, Thanks a lot for the comments and it was really helpful. On Fri, Apr 22, 2016 at 6:33 AM, Nirmal Fernando <nir...@wso2.com> wrote:
> [Removed architecture@] > Will do. > > On Fri, Apr 22, 2016 at 12:05 AM, Yudhanjaya Wijeratne < > yudhanj...@wso2.com> wrote: > >> Hi Nirmal, Thamali has briefed me on this article. Please provide >> technical review? I'll do the grammar once you've approved it. >> Best, Yudha >> On Apr 21, 2016 6:03 PM, "Thamali Wijewardhana" <tham...@wso2.com> wrote: >> >>> Hi, >>> >>> I have completed writing the article[1] containing the comparison >>> between the deeplearning4j library and Keras library considering Recurrent >>> Neural network(RNN) algorithm. >>> I also have found out the reasons for low performance of Deeplearning4j >>> library using Java Flight Recorder(JFR) and Flame Graphs and included in >>> the article. >>> >>> [1] >>> https://docs.google.com/a/wso2.com/document/d/1CGq1y5QBzW6EaHyf-UqAiatxLumb6lo_mRLjYZWD18o/edit?usp=sharing >>> >>> Thanks >>> >>> >>> On Fri, Apr 8, 2016 at 7:20 PM, Thamali Wijewardhana <tham...@wso2.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I have used a dataset with 25000 rows and the size is 80 MB. >>>> >>>> The link to the dataset is: >>>> >>>> http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz >>>> >>>> >>>> >>>> >>>> On Fri, Apr 8, 2016 at 3:07 PM, Srinath Perera <srin...@wso2.com> >>>> wrote: >>>> >>>>> Thamali, how big is the data set you are using? ( give me a link to >>>>> the data set as well). >>>>> >>>>> Nirmal, shall we compare the accuracy of RNN vs. Upul's rolling window >>>>> method? >>>>> >>>>> --Srinath >>>>> >>>>> On Fri, Apr 8, 2016 at 9:23 AM, Thamali Wijewardhana <tham...@wso2.com >>>>> > wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I run the RNN algorithm using deeplearning4j library and the Keras >>>>>> python library. The dataset, hyper parameters, network architecture and >>>>>> the >>>>>> hardware platform are the same. Given below is the time comparison >>>>>> >>>>>> Deeplearning4j library-40 minutes per 1 epoch >>>>>> Keras library- 4 minutes per 1 epoch >>>>>> >>>>>> I also compared the accuracies[1]. The deeplearning4j library gives a >>>>>> low accuracy compared to Keras library. >>>>>> >>>>>> [1] >>>>>> https://docs.google.com/spreadsheets/d/1-EvC1P7N90k1S_Ly6xVcFlEEKprh7r41Yk8aI6DiSaw/edit#gid=1050346562 >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Apr 1, 2016 at 10:12 AM, Thamali Wijewardhana < >>>>>> tham...@wso2.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> I have organized a review on Monday (4th of April). >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> On Thu, Mar 31, 2016 at 3:21 PM, Srinath Perera <srin...@wso2.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Please setup a review. Shall we do it monday? >>>>>>>> >>>>>>>> On Thu, Mar 31, 2016 at 2:15 PM, Thamali Wijewardhana < >>>>>>>> tham...@wso2.com> wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> we have created a spark program to prove the feasibility of adding >>>>>>>>> the RNN algorithm to machine learner. >>>>>>>>> This program demonstrates all the steps in machine learner: >>>>>>>>> >>>>>>>>> Uploading a dataset >>>>>>>>> >>>>>>>>> Selecting the hyper parameters for the model >>>>>>>>> >>>>>>>>> Creating a RNN model using data and training the model >>>>>>>>> >>>>>>>>> Calculating the accuracy of the model >>>>>>>>> >>>>>>>>> Saving the model(As a serialization object) >>>>>>>>> >>>>>>>>> predicting using the model >>>>>>>>> >>>>>>>>> This program is based on deeplearning4j and apache spark pipeline. >>>>>>>>> Deeplearning4j was used as the deep learning library for recurrent >>>>>>>>> neural >>>>>>>>> network algorithm. As the program should be based on the Spark >>>>>>>>> pipeline, >>>>>>>>> the main challenge was to use deeplearning4j library with spark >>>>>>>>> pipeline. >>>>>>>>> The components used in the spark pipeline should be compatible with >>>>>>>>> spark >>>>>>>>> pipeline. For other components which are not compatible with spark >>>>>>>>> pipeline, we have to wrap them with a org.apache.spark.predictionModel >>>>>>>>> object. >>>>>>>>> >>>>>>>>> We have designed a pipeline with sequence of stages (transformers >>>>>>>>> and estimators): >>>>>>>>> >>>>>>>>> 1. Tokenizer:Transformer-Split each sequential data to tokens.(For >>>>>>>>> example, in sentiment analysis, split text into words) >>>>>>>>> >>>>>>>>> 2. Vectorizer :Transformer-Transforms features into vectors. >>>>>>>>> >>>>>>>>> 3. RNN algorithm :Estimator -RNN algorithm which trains on a data >>>>>>>>> frame and produces a RNN model >>>>>>>>> >>>>>>>>> 4. RNN model : Transformer- Transforms data frame with features to >>>>>>>>> data frame with predictions. >>>>>>>>> >>>>>>>>> The diagrams below explains the stages of the pipeline. The first >>>>>>>>> diagram illustrates the training usage of the pipeline and the next >>>>>>>>> diagram >>>>>>>>> illustrates the testing and predicting usage of a pipeline. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I also have tuned the RNN model for hyper parameters[1] and found >>>>>>>>> the values of hyper parameters which optimizes accuracy of the model. >>>>>>>>> Give below is the set of hyper parameters relevant to RNN >>>>>>>>> algorithm and the tuned values. >>>>>>>>> >>>>>>>>> >>>>>>>>> Number of epochs-10 >>>>>>>>> >>>>>>>>> Number of iterations- 1 >>>>>>>>> >>>>>>>>> Learning rate-0.02 >>>>>>>>> >>>>>>>>> We used the aclImdb sentiment analysis data set for this program >>>>>>>>> and with the above hyper parameters, we could achieve 60% accuracy. >>>>>>>>> And we >>>>>>>>> are trying to improve the accuracy and efficiency of our algorithm. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://docs.google.com/spreadsheets/d/1Wcta6i2k4Je_5l16wCVlH6zBMNGIb-d7USaWdbrkrSw/edit?ts=56fcdc9b#gid=2118685173 >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Fri, Mar 25, 2016 at 10:18 AM, Thamali Wijewardhana < >>>>>>>>> tham...@wso2.com> wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> One of the most important obstacles in machine learning and deep >>>>>>>>>> learning is getting data into a format that neural nets can >>>>>>>>>> understand. >>>>>>>>>> Neural nets understand vectors. Therefore, vectorization is an >>>>>>>>>> important >>>>>>>>>> part in building neural network algorithms. >>>>>>>>>> >>>>>>>>>> Canova is a Vectorization library for Machine Learning which is >>>>>>>>>> associated with deeplearning4j library. It is designed to support >>>>>>>>>> all major >>>>>>>>>> types of input data such as text,csv,image,audio,video and etc. >>>>>>>>>> >>>>>>>>>> In our project to add RNN for Machine Learner, we have to use a >>>>>>>>>> vectorizing component to convert input data to vectors. I think that >>>>>>>>>> Canova >>>>>>>>>> is a better to build a generic vectorizing component. I am >>>>>>>>>> researching on >>>>>>>>>> using Canova for the vectorizing purpose. >>>>>>>>>> >>>>>>>>>> Any suggestions on this are highly appreciated. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Mar 2, 2016 at 2:25 PM, Thamali Wijewardhana < >>>>>>>>>> tham...@wso2.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Srinath, >>>>>>>>>>> >>>>>>>>>>> We have decided to implement only classification first. Once we >>>>>>>>>>> complete the classification, we hope to do next value prediction >>>>>>>>>>> too. >>>>>>>>>>> We are basically trying to implement a program to make sure that >>>>>>>>>>> the deeplearning4j library we are using is compatible with apache >>>>>>>>>>> spark >>>>>>>>>>> pipeline. And also we are trying to demonstrate all the machine >>>>>>>>>>> learning >>>>>>>>>>> steps with that program. >>>>>>>>>>> >>>>>>>>>>> We are now using aclImdb sentiment analysis data set to verify >>>>>>>>>>> the accuracy of the RNN model we create. >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> Thamali >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Mar 2, 2016 at 10:38 AM, Srinath Perera < >>>>>>>>>>> srin...@wso2.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Thamali, >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 1. RNN can do both classification and predict next value. >>>>>>>>>>>> Are we trying to do both? >>>>>>>>>>>> 2. When Upul played with it, he had trouble getting >>>>>>>>>>>> deeplearning4j implementation work with predict next value >>>>>>>>>>>> scenario. Is it >>>>>>>>>>>> fixed? >>>>>>>>>>>> 3. What are the data sets we will use to verify the >>>>>>>>>>>> accuracy of RNN after integration? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> --Srinath >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Mar 1, 2016 at 3:44 PM, Thamali Wijewardhana < >>>>>>>>>>>> tham...@wso2.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Currently we are working on a project to add Recurrent Neural >>>>>>>>>>>>> Network(RNN) algorithm to machine learner. RNN is one of deep >>>>>>>>>>>>> learning >>>>>>>>>>>>> algorithms with record breaking accuracy. For more information on >>>>>>>>>>>>> RNN >>>>>>>>>>>>> please refer link[1]. >>>>>>>>>>>>> >>>>>>>>>>>>> We have decided to use deeplearning4j which is an open source >>>>>>>>>>>>> deep learning library scalable on spark and Hadoop. >>>>>>>>>>>>> >>>>>>>>>>>>> Since there is a plan to add spark pipeline to machine >>>>>>>>>>>>> Learner, we have decided to use spark pipeline concept to our >>>>>>>>>>>>> project. >>>>>>>>>>>>> >>>>>>>>>>>>> I have designed an architecture for the RNN implementation. >>>>>>>>>>>>> >>>>>>>>>>>>> This architecture is developed to be compatible with spark >>>>>>>>>>>>> pipeline. >>>>>>>>>>>>> >>>>>>>>>>>>> Data set is taken in csv format and then it is converted to >>>>>>>>>>>>> spark data frame since apache spark works mostly with data frames. >>>>>>>>>>>>> >>>>>>>>>>>>> Next step is a transformer which is needed to tokenize the >>>>>>>>>>>>> sequential data. A tokenizer is basically used for take a >>>>>>>>>>>>> sequence of data >>>>>>>>>>>>> and break it into individual units. For example, it can be used >>>>>>>>>>>>> to break >>>>>>>>>>>>> the words in a sentence to words. >>>>>>>>>>>>> >>>>>>>>>>>>> Next step is again a transformer used to converts tokens to >>>>>>>>>>>>> vectors. This must be done because the features should be added >>>>>>>>>>>>> to spark >>>>>>>>>>>>> pipeline in org.apache.spark.mllib.linlag.VectorUDT format. >>>>>>>>>>>>> >>>>>>>>>>>>> Next, the transformed data are fed to the data set iterator. >>>>>>>>>>>>> This is an object of a class which implement >>>>>>>>>>>>> org.deeplearning4j.datasets.iterator.DataSetIterator. The dataset >>>>>>>>>>>>> iterator >>>>>>>>>>>>> traverses through a data set and prepares data for neural >>>>>>>>>>>>> networks. >>>>>>>>>>>>> >>>>>>>>>>>>> Next component is the RNN algorithm model which is an >>>>>>>>>>>>> estimator. The iterated data from data set iterator is fed to RNN >>>>>>>>>>>>> and a >>>>>>>>>>>>> model is generated. Then this model can be used for predictions. >>>>>>>>>>>>> >>>>>>>>>>>>> We have decided to complete this project in two steps : >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - >>>>>>>>>>>>> >>>>>>>>>>>>> First create a spark pipeline program containing the steps >>>>>>>>>>>>> in machine learner(uploading dataset, generate model, >>>>>>>>>>>>> calculating accuracy >>>>>>>>>>>>> and prediction) and check whether the project is feasible. >>>>>>>>>>>>> - >>>>>>>>>>>>> >>>>>>>>>>>>> Next add the algorithm to ML >>>>>>>>>>>>> >>>>>>>>>>>>> Currently we have almost completed the first step and now we >>>>>>>>>>>>> are collecting more data and tuning for hyper parameters. >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> https://docs.google.com/document/d/1edg1fdKCYR7-B1oOLy2kon179GSs6x2Zx9oSRDn_NEU/edit >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> ============================ >>>>>>>>>>>> Srinath Perera, Ph.D. >>>>>>>>>>>> http://people.apache.org/~hemapani/ >>>>>>>>>>>> http://srinathsview.blogspot.com/ >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> ============================ >>>>>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>>>>> Site: http://home.apache.org/~hemapani/ >>>>>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>>>>> Phone: 0772360902 >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> ============================ >>>>> Blog: http://srinathsview.blogspot.com twitter:@srinath_perera >>>>> Site: http://home.apache.org/~hemapani/ >>>>> Photos: http://www.flickr.com/photos/hemapani/ >>>>> Phone: 0772360902 >>>>> >>>> >>>> >>> > > > -- > > Thanks & regards, > Nirmal > > Team Lead - WSO2 Machine Learner > Associate Technical Lead - Data Technologies Team, WSO2 Inc. > Mobile: +94715779733 > Blog: http://nirmalfdo.blogspot.com/ > > >
_______________________________________________ Architecture mailing list Architecture@wso2.org https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture