Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/12088#issuecomment-204010722 @liancheng I main question is that this PR adds 100 lines of code without introducing new features to LibSVM source. The code in `buildReader` now mixes Tungsten internals with parsing code, requiring people who understand both to maintain. I'm okay with the changes but it would be great to think of a way to separate internals from data source implementation. Essentially, LIBSVM is a text-based source with a LIBSVM record parser and it might require two passes to the data.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org