Hello :)


*tldr*: For efficiency, allow to split an ActiveRecord relation in batches by 
giving the bounds of the batches instead of an enumerator of relations.

This makes it possible, for instance, to split a table in chunks processed by 
parallel jobs.


*Rationale*: Doing some heavy lifting on a 1 000 000 rows table requires to use 
batches.

Indeed, loading all of them at once will trigger a memory overflow.
For that, ActiveRecord already offers `in_batches`, `find_in_batches` and 
`find_each`.
However, if the processing of one batch of 1000 records takes 5 seconds, 
processing the whole table will
take 1000 x 5 seconds ~= 1h30, which is not an acceptable duration for a 
job.
What you can do instead is to split the table in 1000 batches that will be 
processed in parallel
via a processing queue - for example.
To do so, you need to determine how to split the table into batches. You 
could do it by using `find_in_batches`
then enqueue jobs that take a list of ids as argument. A more efficient 
method is to determine the boundaries
of the batches. For instance, if the ids follow an incremental sequence 
between 1 to 1 000 000,
the boundaries are 1, 1001, 2001, ..., 999 991, 1 000 000.
Then a batch is determined by its lower and upper bound.


*Note*: I already implemented a working version for my company's codebase.

It's heavily needed for us, as Heroku shuts down workers after a 30 seconds 
grace period.

We also want to avoid table-level locks while performing bulk updates. Using 
batches also helps.


*Example*:

```

class MyJobLauncher < ApplicationJob

  def perform

    User.some_scope.each_batch_bounds(batch_size: 10000) do |lower_bound, 
upper_bound|

      MyJob.perform_later(lower_bound, upper_bound)

    end

  end

end


class MyJob < ApplicationJob

  def perform(lower_bound, upper_bound)

    do_some_heavy_lifting_with User.some_scope.batch_from_bounds(lower_bound, 
upper_bound)

  end

end

```

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to rubyonrails-core+unsubscr...@googlegroups.com.
To post to this group, send email to rubyonrails-core@googlegroups.com.
Visit this group at https://groups.google.com/group/rubyonrails-core.
For more options, visit https://groups.google.com/d/optout.

Reply via email to