Behaviors at this level of detail, across different ML implementations, are highly unlikely to ever align exactly. Statistically small changes in logic, such as "<" versus "<=", or differences in random number generators, etc, (to say nothing of different implementation languages) will accumulate over training to yield different models, even if their overall performance should be similar.
. The random forest are a good example. I expected them to be dependent on > feature/instance order. However, they are not in Weka, only in scikit-learn > and Spark MLlib. There are more such examples, like logistic regression > that exhibits different behavior in all three libraries. >