Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20146 I think all dataset with a string order get indexed, as far as I recall? Pick existing R dataset is just a convenience, we can also make up a few lines of data if that works out better. Although as a separate note the difference in sort order is potentially something we should document, esp if it goes beyond glm, for example in sql functions too
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org