This is exactly the approach I first had :) `in_batches` plucks the ids by batch and then builds the yielded relation by adding a where clause on those ids. With 1 millions rows, that's 1 million ids plucked. When plucking bounds only, say for batches of size 10 000, that's 200 bounds only (1 - 10000, 10001 - 20000, ..., 990 001, 1 000 000). On 5 million rows, it takes ~ 30 seconds to build the relations, ~ 5 seconds to build the bounds. I guess we could modify `in_batches` to use bounds instead, by yielding relations that apply a condition `where primary key between this and that` instead of `where primary key in those ids`. This is however a bit more prone to racing conditions, but they are inherent to batching, as the current document explains.
On Wednesday, September 5, 2018 at 2:40:51 PM UTC+2, Greg Navis wrote: > > Thanks for the explanation. That makes sense. > > Your use cases is certainly valid but I'm on the fence whether this should > be in Active Record. It should be possible to extract boundaries using > #in_batches and #where_values_hash: > > relation.in_batches do |relation| > min_id, max_id = relation.where_values_hash['id'].minmax > # Do something with the boundaries. > end > > I'm not sure whether that warrants a separate method. > -- You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group. To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-core+unsubscr...@googlegroups.com. To post to this group, send email to rubyonrails-core@googlegroups.com. Visit this group at https://groups.google.com/group/rubyonrails-core. For more options, visit https://groups.google.com/d/optout.