*An accompanying livebook is attached if you'd prefer to read this proposal interactively.* Introduction
Enum.split_while/2 (“split_while/2“) is one of the few (only?) functions in the Enum suite that does not throw away the rest of the Enumerable.t() when accumulation halts. split_while/2 is useful for processing part of a list, but passing the rest somewhere else. For example, if I have a list of integers and I only want to collect integers within a certain range from the beginning, I can use split_while/2 like this: ints = [0, 4, 3, 18, 20, -7, -4, 6] {first, rest} = Enum.split_while(ints, &(&1 >= 0 && &1 <= 10)) => {[0, 4, 3], [18, 20, -7, -4, 6]} And process them individually: Enum.sum(first) => 7 Enum.product(rest) => 60480 Problem Description However, there is no way to use split_while/2 with state. It will only run the predicate function it receives on the current element. Take the following pseudocode example: split_while(ints, 0, fn _int, acc when acc > 10 -> {:halt, acc} int, acc -> {:cont, acc + int} end) Here, it is advantageous to use an accumulation of previous results to know if we should halt. The remaining list should be returned for further processing. Without state in split_while/2, we have to do something rather inelegant: result = Enum.reduce_while(ints, %{ints: [], sum: 0, encountered: 0}, fn _int, %{sum: sum} = acc when sum > 10 -> {:halt, %{acc | ints: Enum.reverse(acc.ints)}} int, acc -> {:cont, %{ acc | ints: [int | acc.ints], sum: acc.sum + int, encountered: acc.encountered + 1 }} end) => %{encountered: 4, ints: [0, 4, 3, 18], sum: 25} {ints, rest} = Enum.split(ints, result.encountered) => {[0, 4, 3, 18], [20, -7, -4, 6]} {:ok, %{ints: ints, sum: result.sum}, rest} => {:ok, %{ints: [0, 4, 3, 18], sum: 25}, [20, -7, -4, 6]} Or write boilerplate code every time this sort of operation is needed. A “stateful split while” should be a part of the standard library. I propose a version of split_while/2 that takes the Enumerable.t(), collecting an accumulator until the discriminator function returns {:halt, any}. (*Or something like it, see below.*) Here’s the first draft: EnumExt Here is an example of how that might be achieved by modifying Enum.take_while/2: defmodule EnumExt do @spec split_while_reduce( Enumerable.t(), any, (Enum.element(), any -> {:cont, any} | {:halt, any}) ) :: {{[Enum.element()], any}, [Enum.element()]} def split_while_reduce(enumerable, initial_state, fun) when is_list(enumerable) do split_while_reduce_list(enumerable, fun, {[], initial_state}) end def split_while_reduce(enumerable, initial_state, fun) do {{list1, state}, list2} = Enum.reduce(enumerable, {{[], initial_state}, []}, fn entry, {{acc1, state}, []} -> case fun.(entry, state) do {:cont, new_state} -> {{[entry | acc1], new_state}, []} {:halt, new_state} -> {{acc1, new_state}, [entry]} end entry, {{acc1, state}, acc2} -> {{acc1, state}, [entry | acc2]} end) {{:lists.reverse(list1), state}, :lists.reverse(list2)} end defp split_while_reduce_list([head | tail], fun, {acc, state}) do case fun.(head, state) do {:cont, new_state} -> split_while_reduce_list(tail, fun, {[head | acc], new_state}) {:halt, new_state} -> {{:lists.reverse(acc), new_state}, [head | tail]} end end defp split_while_reduce_list([], _fun, {acc, state}) do {{:lists.reverse(acc), state}, []} end end => {:module, EnumExt, <<70, 79, 82, 49, 0, 0, 12, ...>>, {:split_while_reduce_list, 3}} Let’s try it out. We have a list of binaries. We want to halt accumulation as soon as our data reaches a size or length of 100: items = [ <<226, 188, 146, 226, 164, 184, 226, 135, 138, 226, 135, 162, 226, 183, 169, 226, 168, 166, 226, 178, 165>>, <<226, 161, 181, 226, 134, 162, 226, 168, 189, 226, 133, 128, 226, 149, 157>>, <<226, 140, 176, 226, 156, 180, 226, 151, 182, 226, 190, 150, 226, 138, 156, 226, 155, 186>>, <<226, 173, 134, 226, 185, 134, 226, 179, 176, 226, 156, 163, 226, 139, 128, 226, 189, 185>>, <<226, 156, 188, 226, 171, 187, 226, 163, 182, 226, 133, 145>>, <<226, 174, 185, 226, 186, 148, 226, 160, 163, 226, 133, 143, 226, 161, 147>>, <<226, 137, 182, 226, 152, 133, 226, 147, 167, 226, 142, 171, 226, 173, 153>>, <<226, 159, 143, 226, 173, 172, 226, 153, 183, 226, 145, 172>>, <<226, 176, 157, 226, 157, 183, 226, 136, 160>>, <<226, 181, 140, 226, 152, 172, 226, 162, 154, 226, 153, 172, 226, 137, 176, 226, 169, 176>> ] => ["⼒⤸⇊⇢ⷩ⨦ⲥ", "⡵↢⨽⅀╝", "⌰✴◶⾖⊜⛺", "⭆⹆⳰✣⋀⽹", "✼⫻⣶⅑", "⮹⺔⠣⅏⡓", "≶★ⓧ⎫⭙", "⟏⭬♷⑬", "Ⱍ❷∠", "ⵌ☬⢚♬≰⩰"] import EnumExt, only: [split_while_reduce: 3] split_while_reduce(items, %{size: 0, length: 0, string: ""}, fn item, state -> size = byte_size(item) length = String.length(item) new_size = state.size + size new_length = state.length + length if new_size < 100 and new_length < 100 do {:cont, %{state | size: new_size, length: new_length, string: state.string <> item}} else {:halt, state} end end) => {{["⼒⤸⇊⇢ⷩ⨦ⲥ", "⡵↢⨽⅀╝", "⌰✴◶⾖⊜⛺", "⭆⹆⳰✣⋀⽹", "✼⫻⣶⅑", "⮹⺔⠣⅏⡓"], %{ length: 31, size: 99, string: "⼒⤸⇊⇢ⷩ⨦ⲥ⡵↢⨽⅀╝⌰✴◶⾖⊜⛺⭆⹆⳰✣⋀⽹✼⫻⣶⅑⮹⺔⠣⅏⡓" }}, ["≶★ⓧ⎫⭙", "⟏⭬♷⑬", "Ⱍ❷∠", "ⵌ☬⢚♬≰⩰"]} It’s still a little clunky, no? I mean, the part of the list that is being accumulated could be collected into a list in the state as well. I'm thinking of something similar to how Enum.map_reduce/3 returns a list and an accumulator. In this case, however, the list returned would be the *untouched values*. The simplified version: EnumExt2 defmodule EnumExt2 do @spec preserve_reduce( Enumerable.t(), any, (Enum.element(), any -> {:cont, any} | {:halt, any}) ) :: {any, [Enum.element()]} def preserve_reduce(enumerable, initial_state, fun) when is_list(enumerable) do preserve_reduce_list(enumerable, fun, initial_state) end def preserve_reduce(enumerable, initial_state, fun) do {state, list} = Enum.reduce(enumerable, {initial_state, []}, fn entry, {state, []} -> case fun.(entry, state) do {:cont, new_state} -> {new_state, []} {:halt, new_state} -> {new_state, [entry]} end entry, {state, acc} -> {state, [entry | acc]} end) {state, :lists.reverse(list)} end defp preserve_reduce_list([head | tail], fun, state) do case fun.(head, state) do {:cont, new_state} -> preserve_reduce_list(tail, fun, new_state) {:halt, new_state} -> {new_state, [head | tail]} end end defp preserve_reduce_list([], _fun, state) do {state, []} end end => {:module, EnumExt2, <<70, 79, 82, 49, 0, 0, 11, ...>>, {:preserve_reduce_list, 3}} import EnumExt2, only: [preserve_reduce: 3] initial_state = %{size: 0, length: 0, string: "", collected: []} preserve_reduce(items, initial_state, fn item, state -> size = byte_size(item) length = String.length(item) new_size = state.size + size new_length = state.length + length if new_size < 100 and new_length < 100 do {:cont, %{ state | size: new_size, length: new_length, string: state.string <> item, collected: [item | state.collected] }} else {:halt, %{state | collected: Enum.reverse(state.collected)}} end end) => {%{ collected: ["⼒⤸⇊⇢ⷩ⨦ⲥ", "⡵↢⨽⅀╝", "⌰✴◶⾖⊜⛺", "⭆⹆⳰✣⋀⽹", "✼⫻⣶⅑", "⮹⺔⠣⅏⡓"], length: 31, size: 99, string: "⼒⤸⇊⇢ⷩ⨦ⲥ⡵↢⨽⅀╝⌰✴◶⾖⊜⛺⭆⹆⳰✣⋀⽹✼⫻⣶⅑⮹⺔⠣⅏⡓" }, ["≶★ⓧ⎫⭙", "⟏⭬♷⑬", "Ⱍ❷∠", "ⵌ☬⢚♬≰⩰"]} *I have no idea why the fourth element in the collected list is displaying like that. The source HTML has the correct characters.* While the developer now has to reverse any list elements they want to keep, it gives us a cleaner return. In the case that the developer does not want to accumulate the processed list, they just don’t accumulate it. Conclusion The standard library would benefit from such a pattern. I’m unsure of a good name (split_while_reduce and preserve_reduce kind of seem out of place compared to the other names in the library) for such a function. I was considering partial_reduce, but that collides with the concept of partial functions. I’d love to hear your feedback or your ideas about how this can already be done in the standard library! Best regards, Douglas -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/ab2d6114-1d1f-424a-89b6-85b6648080d8n%40googlegroups.com.
2024-08-06_proposal-enum-preserve-reduce-3.output.livemd
Description: Binary data
2024-08-06_proposal-enum-preserve-reduce-3.nooutput.livemd
Description: Binary data