GitHub user mpetruska opened a pull request:

    https://github.com/apache/spark/pull/19659

    [SPARK-19668][ML] Multiple NGram sizes

    ## What changes were proposed in this pull request?
    
    [Jira ticket](https://issues.apache.org/jira/browse/SPARK-19668):
    - implements extraction of multiple sizes of ngrams with `feature.NGram`
    
    ## How was this patch tested?
    
    - unit tests were added for the implementation (`multiSliding` function)
    - test cases were added to `NGramSuite` and `Word2VecSuite`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mpetruska/spark SPARK-19668

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19659.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19659
    
----
commit 11f26f40b5f3d7325b6da3f7b32b0fd4db0449ee
Author: Mark Petruska <petruska.m...@gmail.com>
Date:   2017-11-04T11:34:39Z

    implements NGram over a range of  values

commit 74fee5531a605fcc876dd7864713ba28e0d4ad71
Author: Mark Petruska <petruska.m...@gmail.com>
Date:   2017-11-04T13:04:17Z

    adds tests for multiSliding

commit 0364a56ef6d71939a4db5d81f8ce0e68c1ecd546
Author: Mark Petruska <petruska.m...@gmail.com>
Date:   2017-11-04T13:17:57Z

    adds additional tests for multi-length ngram interactions

commit 96e2a6267002d263927ecb99f5d46642a5c4df4d
Author: Mark Petruska <petruska.m...@gmail.com>
Date:   2017-11-04T13:18:18Z

    adds maxN parameter to NGram examples

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to