*An accompanying livebook is attached if you'd prefer to read this proposal 
interactively.*
Introduction

Enum.split_while/2 (“split_while/2“) is one of the few (only?) functions in 
the Enum suite that does not throw away the rest of the Enumerable.t() when 
accumulation halts.

split_while/2 is useful for processing part of a list, but passing the rest 
somewhere else. For example, if I have a list of integers and I only want 
to collect integers within a certain range from the beginning, I can use 
split_while/2 like this:

ints = [0, 4, 3, 18, 20, -7, -4, 6]

{first, rest} = Enum.split_while(ints, &(&1 >= 0 && &1 <= 10))

=> {[0, 4, 3], [18, 20, -7, -4, 6]}

And process them individually:

Enum.sum(first)

=> 7

Enum.product(rest)

=> 60480

Problem Description

However, there is no way to use split_while/2 with state. It will only run 
the predicate function it receives on the current element. Take the 
following pseudocode example:

split_while(ints, 0, fn 
  _int, acc when acc > 10 -> {:halt, acc}
  int, acc -> {:cont, acc + int} 
end)

Here, it is advantageous to use an accumulation of previous results to know 
if we should halt. The remaining list should be returned for further 
processing.

Without state in split_while/2, we have to do something rather inelegant:

result =
  Enum.reduce_while(ints, %{ints: [], sum: 0, encountered: 0}, fn
    _int, %{sum: sum} = acc when sum > 10 ->
      {:halt, %{acc | ints: Enum.reverse(acc.ints)}}

    int, acc ->
      {:cont,
       %{
         acc
         | ints: [int | acc.ints],
           sum: acc.sum + int,
           encountered: acc.encountered + 1
       }}
  end)

=> %{encountered: 4, ints: [0, 4, 3, 18], sum: 25}

{ints, rest} = Enum.split(ints, result.encountered)

=> {[0, 4, 3, 18], [20, -7, -4, 6]}

{:ok, %{ints: ints, sum: result.sum}, rest}

=> {:ok, %{ints: [0, 4, 3, 18], sum: 25}, [20, -7, -4, 6]}

Or write boilerplate code every time this sort of operation is needed. A 
“stateful split while” should be a part of the standard library.

I propose a version of split_while/2 that takes the Enumerable.t(), 
collecting an accumulator until the discriminator function returns {:halt, 
any}. (*Or something like it, see below.*) Here’s the first draft:

EnumExt

Here is an example of how that might be achieved by modifying 
Enum.take_while/2:

defmodule EnumExt do
  @spec split_while_reduce(
          Enumerable.t(),
          any,
          (Enum.element(), any ->
             {:cont, any} | {:halt, any})
        ) ::
          {{[Enum.element()], any}, [Enum.element()]}
  def split_while_reduce(enumerable, initial_state, fun) when 
is_list(enumerable) do
    split_while_reduce_list(enumerable, fun, {[], initial_state})
  end

  def split_while_reduce(enumerable, initial_state, fun) do
    {{list1, state}, list2} =
      Enum.reduce(enumerable, {{[], initial_state}, []}, fn
        entry, {{acc1, state}, []} ->
          case fun.(entry, state) do
            {:cont, new_state} -> {{[entry | acc1], new_state}, []}
            {:halt, new_state} -> {{acc1, new_state}, [entry]}
          end

        entry, {{acc1, state}, acc2} ->
          {{acc1, state}, [entry | acc2]}
      end)

    {{:lists.reverse(list1), state}, :lists.reverse(list2)}
  end

  defp split_while_reduce_list([head | tail], fun, {acc, state}) do
    case fun.(head, state) do
      {:cont, new_state} -> split_while_reduce_list(tail, fun, {[head | 
acc], new_state})
      {:halt, new_state} -> {{:lists.reverse(acc), new_state}, [head | 
tail]}
    end
  end

  defp split_while_reduce_list([], _fun, {acc, state}) do
    {{:lists.reverse(acc), state}, []}
  end
end

=> {:module, EnumExt, <<70, 79, 82, 49, 0, 0, 12, ...>>, 
{:split_while_reduce_list, 3}}

Let’s try it out. We have a list of binaries. We want to halt accumulation 
as soon as our data reaches a size or length of 100:

items =
  [
    <<226, 188, 146, 226, 164, 184, 226, 135, 138, 226, 135, 162, 226, 183, 
169, 226, 168, 166,
      226, 178, 165>>,
    <<226, 161, 181, 226, 134, 162, 226, 168, 189, 226, 133, 128, 226, 149, 
157>>,
    <<226, 140, 176, 226, 156, 180, 226, 151, 182, 226, 190, 150, 226, 138, 
156, 226, 155, 186>>,
    <<226, 173, 134, 226, 185, 134, 226, 179, 176, 226, 156, 163, 226, 139, 
128, 226, 189, 185>>,
    <<226, 156, 188, 226, 171, 187, 226, 163, 182, 226, 133, 145>>,
    <<226, 174, 185, 226, 186, 148, 226, 160, 163, 226, 133, 143, 226, 161, 
147>>,
    <<226, 137, 182, 226, 152, 133, 226, 147, 167, 226, 142, 171, 226, 173, 
153>>,
    <<226, 159, 143, 226, 173, 172, 226, 153, 183, 226, 145, 172>>,
    <<226, 176, 157, 226, 157, 183, 226, 136, 160>>,
    <<226, 181, 140, 226, 152, 172, 226, 162, 154, 226, 153, 172, 226, 137, 
176, 226, 169, 176>>
  ]

=> ["⼒⤸⇊⇢ⷩ⨦ⲥ", "⡵↢⨽⅀╝", "⌰✴◶⾖⊜⛺", "⭆⹆⳰✣⋀⽹",
 "✼⫻⣶⅑", "⮹⺔⠣⅏⡓", "≶★ⓧ⎫⭙", "⟏⭬♷⑬", "Ⱍ❷∠",
 "ⵌ☬⢚♬≰⩰"]

import EnumExt, only: [split_while_reduce: 3]

split_while_reduce(items, %{size: 0, length: 0, string: ""}, fn item, state 
->
  size = byte_size(item)
  length = String.length(item)

  new_size = state.size + size
  new_length = state.length + length

  if new_size < 100 and new_length < 100 do
    {:cont, %{state | size: new_size, length: new_length, string: 
state.string <> item}}
  else
    {:halt, state}
  end
end)

=> {{["⼒⤸⇊⇢ⷩ⨦ⲥ", "⡵↢⨽⅀╝", "⌰✴◶⾖⊜⛺", "⭆⹆⳰✣⋀⽹",
   "✼⫻⣶⅑", "⮹⺔⠣⅏⡓"],
  %{
    length: 31,
    size: 99,
    string: "⼒⤸⇊⇢ⷩ⨦ⲥ⡵↢⨽⅀╝⌰✴◶⾖⊜⛺⭆⹆⳰✣⋀⽹✼⫻⣶⅑⮹⺔⠣⅏⡓"
  }}, ["≶★ⓧ⎫⭙", "⟏⭬♷⑬", "Ⱍ❷∠", "ⵌ☬⢚♬≰⩰"]}

It’s still a little clunky, no? I mean, the part of the list that is being 
accumulated could be collected into a list in the state as well. I'm 
thinking of something similar to how Enum.map_reduce/3 returns a list and 
an accumulator.

In this case, however, the list returned would be the *untouched values*. 
The simplified version:

EnumExt2

defmodule EnumExt2 do
  @spec preserve_reduce(
          Enumerable.t(),
          any,
          (Enum.element(), any ->
             {:cont, any} | {:halt, any})
        ) ::
          {any, [Enum.element()]}
  def preserve_reduce(enumerable, initial_state, fun) when 
is_list(enumerable) do
    preserve_reduce_list(enumerable, fun, initial_state)
  end

  def preserve_reduce(enumerable, initial_state, fun) do
    {state, list} =
      Enum.reduce(enumerable, {initial_state, []}, fn
        entry, {state, []} ->
          case fun.(entry, state) do
            {:cont, new_state} -> {new_state, []}
            {:halt, new_state} -> {new_state, [entry]}
          end

        entry, {state, acc} ->
          {state, [entry | acc]}
      end)

    {state, :lists.reverse(list)}
  end

  defp preserve_reduce_list([head | tail], fun, state) do
    case fun.(head, state) do
      {:cont, new_state} -> preserve_reduce_list(tail, fun, new_state)
      {:halt, new_state} -> {new_state, [head | tail]}
    end
  end

  defp preserve_reduce_list([], _fun, state) do
    {state, []}
  end
end

=> {:module, EnumExt2, <<70, 79, 82, 49, 0, 0, 11, ...>>, 
{:preserve_reduce_list, 3}}

import EnumExt2, only: [preserve_reduce: 3]

initial_state = %{size: 0, length: 0, string: "", collected: []}

preserve_reduce(items, initial_state, fn item, state ->
  size = byte_size(item)
  length = String.length(item)

  new_size = state.size + size
  new_length = state.length + length

  if new_size < 100 and new_length < 100 do
    {:cont,
     %{
       state
       | size: new_size,
         length: new_length,
         string: state.string <> item,
         collected: [item | state.collected]
     }}
  else
    {:halt, %{state | collected: Enum.reverse(state.collected)}}
  end
end)

=> {%{
   collected: ["⼒⤸⇊⇢ⷩ⨦ⲥ", "⡵↢⨽⅀╝", "⌰✴◶⾖⊜⛺",
    "⭆⹆⳰✣⋀⽹", "✼⫻⣶⅑", "⮹⺔⠣⅏⡓"],
   length: 31,
   size: 99,
   string: "⼒⤸⇊⇢ⷩ⨦ⲥ⡵↢⨽⅀╝⌰✴◶⾖⊜⛺⭆⹆⳰✣⋀⽹✼⫻⣶⅑⮹⺔⠣⅏⡓"
 }, ["≶★ⓧ⎫⭙", "⟏⭬♷⑬", "Ⱍ❷∠", "ⵌ☬⢚♬≰⩰"]}

*I have no idea why the fourth element in the collected list is displaying 
like that. The source HTML has the correct characters.*

While the developer now has to reverse any list elements they want to keep, 
it gives us a cleaner return. In the case that the developer does not want 
to accumulate the processed list, they just don’t accumulate it.

Conclusion

The standard library would benefit from such a pattern. I’m unsure of a 
good name (split_while_reduce and preserve_reduce kind of seem out of place 
compared to the other names in the library) for such a function. I was 
considering partial_reduce, but that collides with the concept of partial 
functions.

I’d love to hear your feedback or your ideas about how this can already be 
done in the standard library!

Best regards,

Douglas

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/ab2d6114-1d1f-424a-89b6-85b6648080d8n%40googlegroups.com.

Attachment: 2024-08-06_proposal-enum-preserve-reduce-3.output.livemd
Description: Binary data

Attachment: 2024-08-06_proposal-enum-preserve-reduce-3.nooutput.livemd
Description: Binary data

Reply via email to