This is exactly the approach I first had :)
`in_batches` plucks the ids by batch and then builds the yielded relation 
by adding a where clause on those ids. With 1 millions rows, that's 1 
million ids plucked. When plucking bounds only, say for batches of size 10 
000, that's 200 bounds only (1 - 10000, 10001 - 20000, ..., 990 001, 1 000 
000). On 5 million rows, it takes ~ 30 seconds to build the relations, ~ 5 
seconds to build the bounds. I guess we could modify `in_batches` to use 
bounds instead, by yielding relations that apply a condition `where primary 
key between this and that` instead of `where primary key in those ids`. 
This is however a bit more prone to racing conditions, but they are 
inherent to batching, as the current document explains.

On Wednesday, September 5, 2018 at 2:40:51 PM UTC+2, Greg Navis wrote:
>
> Thanks for the explanation. That makes sense.
>
> Your use cases is certainly valid but I'm on the fence whether this should 
> be in Active Record. It should be possible to extract boundaries using 
> #in_batches and #where_values_hash:
>
> relation.in_batches do |relation|
>   min_id, max_id = relation.where_values_hash['id'].minmax
>   # Do something with the boundaries.
> end
>
> I'm not sure whether that warrants a separate method.
>

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to rubyonrails-core+unsubscr...@googlegroups.com.
To post to this group, send email to rubyonrails-core@googlegroups.com.
Visit this group at https://groups.google.com/group/rubyonrails-core.
For more options, visit https://groups.google.com/d/optout.

Reply via email to