Hi Srinath,

We have decided to  implement only classification first. Once we complete
the classification, we hope to do next value prediction too.
We are basically trying to implement a program to make sure that the
deeplearning4j library we are using is compatible with apache spark
pipeline. And also we are trying to demonstrate all the machine learning
steps with that program.

We are now using aclImdb sentiment analysis data set to verify the accuracy
of the RNN model we create.

Thanks
Thamali


On Wed, Mar 2, 2016 at 10:38 AM, Srinath Perera <srin...@wso2.com> wrote:

> Hi Thamali,
>
>
>    1. RNN can do both classification and predict next value. Are we
>    trying to do both?
>    2. When Upul played with it, he had trouble getting deeplearning4j
>    implementation work with predict next value scenario. Is it fixed?
>    3. What are the data sets we will use to verify the accuracy of RNN
>    after integration?
>
>
> --Srinath
>
> On Tue, Mar 1, 2016 at 3:44 PM, Thamali Wijewardhana <tham...@wso2.com>
> wrote:
>
>> Hi,
>>
>> Currently we are working on a project to add Recurrent Neural
>> Network(RNN) algorithm to machine learner. RNN is one of deep learning
>> algorithms with record breaking accuracy. For more information on RNN
>> please refer link[1].
>>
>> We have decided to use deeplearning4j which is an open source deep
>> learning library scalable on spark and Hadoop.
>>
>> Since there is a plan to add spark pipeline to machine Learner, we have
>> decided to use spark pipeline concept to our project.
>>
>> I have designed an architecture for the RNN implementation.
>>
>> This architecture is developed to be compatible with spark pipeline.
>>
>> Data set is taken in csv format and then it is converted to spark data
>> frame since apache spark works mostly with data frames.
>>
>> Next step is a transformer which is needed to tokenize the sequential
>> data. A tokenizer is basically used for take a sequence of data and break
>> it into individual units. For example, it can be used to break the words in
>> a sentence to words.
>>
>> Next step is again a transformer used to converts tokens to vectors. This
>> must be done because the features should be added to spark pipeline in
>> org.apache.spark.mllib.linlag.VectorUDT format.
>>
>> Next, the transformed data are fed to the data set iterator. This is an
>> object of a class which implement
>> org.deeplearning4j.datasets.iterator.DataSetIterator. The dataset iterator
>> traverses through a data set and prepares data for neural networks.
>>
>> Next component is the RNN algorithm model which is an estimator. The
>> iterated data from data set iterator is fed to RNN and a model is
>> generated. Then this model can be used for predictions.
>>
>> We have decided to complete this project in two steps :
>>
>>
>>    -
>>
>>    First create a spark pipeline program containing the steps in machine
>>    learner(uploading dataset, generate model, calculating accuracy and
>>    prediction) and check whether the project is feasible.
>>    -
>>
>>    Next add the algorithm to ML
>>
>> Currently we have almost completed the first step and now we are
>> collecting more data and tuning for hyper parameters.
>>
>> [1]
>> https://docs.google.com/document/d/1edg1fdKCYR7-B1oOLy2kon179GSs6x2Zx9oSRDn_NEU/edit
>>
>>
>>
>> ​
>>
>
>
>
> --
> ============================
> Srinath Perera, Ph.D.
>    http://people.apache.org/~hemapani/
>    http://srinathsview.blogspot.com/
>
_______________________________________________
Architecture mailing list
Architecture@wso2.org
https://mail.wso2.org/cgi-bin/mailman/listinfo/architecture

Reply via email to