Github user ygcao commented on the pull request: https://github.com/apache/spark/pull/10152#issuecomment-163517944 To see is to believe, comparison is the key. You are encouraged to use my version(using a simple sentence splitter by dot and question mark. Btw:if your data is not text, I want to say Any sequence data has its natural boundary just like sentence.e.g user session's natural boundary is time span of continuous operations), and the old version to build models from the same set of text/data set and then compare them to see differences.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org