(beam) branch master updated: Add Gemini RunInference example notebook (#38943)

jrmccluskey Fri, 12 Jun 2026 08:03:38 -0700

This is an automated email from the ASF dual-hosted git repository.

jrmccluskey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git



The following commit(s) were added to refs/heads/master by this push:
     new 7d642a024a4 Add Gemini RunInference example notebook (#38943)
7d642a024a4 is described below

commit 7d642a024a4e9164f72eaf642f95e3725e4d8f7f
Author: Jack McCluskey <[email protected]>
AuthorDate: Fri Jun 12 11:03:21 2026 -0400

    Add Gemini RunInference example notebook (#38943)
---
 .../notebooks/beam-ml/run_inference_gemini.ipynb   | 609 +++++++++++++++++++++
 1 file changed, 609 insertions(+)

diff --git a/examples/notebooks/beam-ml/run_inference_gemini.ipynb 
b/examples/notebooks/beam-ml/run_inference_gemini.ipynb
new file mode 100644
index 00000000000..2a9731a56a3
--- /dev/null
+++ b/examples/notebooks/beam-ml/run_inference_gemini.ipynb
@@ -0,0 +1,609 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a 
href=\"https://colab.research.google.com/github/jrmccluskey/beam/blob/geminiNotebook/examples/notebooks/beam-ml/run_inference_gemini.ipynb\";
 target=\"_parent\"><img 
src=\"https://colab.research.google.com/assets/colab-badge.svg\"; alt=\"Open In 
Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "id": "fFjof1NgAJwu"
+      },
+      "outputs": [],
+      "source": [
+        "# @title ###### Licensed to the Apache Software Foundation (ASF), 
Version 2.0 (the \"License\")\n",
+        "\n",
+        "# Licensed to the Apache Software Foundation (ASF) under one\n",
+        "# or more contributor license agreements. See the NOTICE file\n",
+        "# distributed with this work for additional information\n",
+        "# regarding copyright ownership. The ASF licenses this file\n",
+        "# to you under the Apache License, Version 2.0 (the\n",
+        "# \"License\"); you may not use this file except in compliance\n",
+        "# with the License. You may obtain a copy of the License at\n",
+        "#\n",
+        "#   http://www.apache.org/licenses/LICENSE-2.0\n";,
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing,\n",
+        "# software distributed under the License is distributed on an\n",
+        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+        "# KIND, either express or implied. See the License for the\n",
+        "# specific language governing permissions and limitations\n",
+        "# under the License"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "A8xNRyZMW1yK"
+      },
+      "source": [
+        "# Apache Beam RunInference with Gemini\n",
+        "\n",
+        "<table align=\"left\">\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" 
href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_gemini.ipynb\";><img
 
src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\";
 />Run in Google Colab</a>\n",
+        "  </td>\n",
+        "  <td>\n",
+        "    <a target=\"_blank\" 
href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_gemini.ipynb\";><img
 
src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\";
 />View source on GitHub</a>\n",
+        "  </td>\n",
+        "</table>\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "HrCtxslBGK8Z"
+      },
+      "source": [
+        "This notebook shows how to use the Apache Beam 
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
 transform for image classification with 
[Gemini](https://ai.google.dev/gemini-api/docs).\n",
+        "Apache Beam has built-in support for sending requests to a remotely 
deployed Gemini model by using the 
[`GeminiModelHandler`](https://github.com/apache/beam/blob/5307c7138af8b5daba7a5495aba87d53ae9b0ae7/sdks/python/apache_beam/ml/inference/gemini_inference.py#L103)
 class. This class allows for custom input functions, enabling the use of any 
Gemini model accessible through the Gemini API; this allows for custom handling 
of any input type and configuration information necessary.\n",
+        "\n",
+        "When you use remote inference with Vertex AI, consider the following 
factors:\n",
+        "1. Sending requests to Gemini incurs cost from Google Cloud. Consider 
using smaller, less expensive models for experimentation and testing code.\n",
+        "\n",
+        "This notebook demonstrates the following steps:\n",
+        "- Configure a request function for the desired Gemini model\n",
+        "- Set up example inputs\n",
+        "- Run those examples with the built-in GeminiModelHandler and get a 
prediction inside an Apache Beam pipeline\n",
+        "- Use custom file sinks to save multimodal outputs from image and 
audio models\n",
+        "\n",
+        "For more information about using RunInference, see [Get started with 
AI/ML pipelines](https://beam.apache.org/documentation/ml/overview/) in the 
Apache Beam documentation."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "gVCtGOKTHMm4"
+      },
+      "source": [
+        "## Before you begin\n",
+        "Set up your environment and download dependencies."
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "YDHPlMjZRuY0"
+      },
+      "source": [
+        "### Install Apache Beam\n",
+        "To use RunInference with the built-in Gemini model handler, install 
the Apache Beam SDK version 2.66.0 or later."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "jBakpNZnAhqk"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install apache_beam[interactive,gcp]==2.74.0 --quiet\n",
+        "\n",
+        "# To use the newly installed versions, restart the runtime.\n",
+        "exit()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "X80jy3FqHjK4"
+      },
+      "source": [
+        "### Create a Gemini API Key\n",
+        "This notebook relies on the Gemini API, which requires an existing 
API key to use. Follow the instructions in the [Gemini API 
Quickstart](https://ai.google.dev/gemini-api/docs/quickstart#before_you_begin) 
to generate an API key for your account.\n",
+        "\n",
+        "To run the following cell, your API key must be stored it in a Colab 
Secret named `GEMINI_API_KEY`.\n",
+        "\n",
+        "1. Open your Google Colab notebook and click on the 🔑 **Secrets** tab 
in the left panel.\n",
+        "   \n",
+        "   <img 
src=\"https://storage.googleapis.com/generativeai-downloads/images/secrets.jpg\";
 alt=\"You can find the Secrets tab on the left panel.\" width=50%>\n",
+        "\n",
+        "2. Create a new secret with the name `GEMINI_API_KEY`.\n",
+        "3. Copy and paste your API key into the `Value` input box of 
`GEMINI_API_KEY`.\n",
+        "4. Toggle the button on the left to allow all notebooks access to the 
secret."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "id": "Kz9sccyGBqz3"
+      },
+      "outputs": [],
+      "source": [
+        "from google.colab import userdata\n",
+        "\n",
+        "GEMINI_API_KEY=userdata.get('GEMINI_API_KEY')"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Import dependencies"
+      ],
+      "metadata": {
+        "id": "wwEvxB2_JO3h"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "import os\n",
+        "import wave\n",
+        "\n",
+        "from collections.abc import Iterable\n",
+        "from collections.abc import Sequence\n",
+        "from io import BytesIO\n",
+        "from typing import Any\n",
+        "from typing import cast\n",
+        "\n",
+        "import apache_beam as beam\n",
+        "from apache_beam.io.fileio import FileSink\n",
+        "from apache_beam.io.fileio import WriteToFiles\n",
+        "from apache_beam.io.fileio import default_file_naming\n",
+        "from apache_beam.ml.inference.base import PredictionResult\n",
+        "from apache_beam.ml.inference.base import RunInference\n",
+        "from apache_beam.ml.inference.gemini_inference import 
GeminiModelHandler\n",
+        "from apache_beam.ml.inference.gemini_inference import 
generate_from_string\n",
+        "from apache_beam.ml.inference.gemini_inference import 
generate_image_from_strings_and_images\n",
+        "from apache_beam.options.pipeline_options import PipelineOptions\n",
+        "\n",
+        "from google import genai\n",
+        "from google.genai import types\n",
+        "from PIL import Image"
+      ],
+      "metadata": {
+        "id": "muTP7ywXJVea"
+      },
+      "execution_count": 2,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "0a1zerXycQ0z"
+      },
+      "source": [
+        "## Run the pipeline"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Text Output\n",
+        "This pipeline code takes text as input and receives text as output 
from Gemini."
+      ],
+      "metadata": {
+        "id": "ERSXxZS-OwBc"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 3,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/";,
+          "height": 766
+        },
+        "id": "St07XoibcQSb",
+        "outputId": "1ba4ef99-fcbb-441e-f991-bd370ff6f6c1"
+      },
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "application/javascript": [
+              "\n",
+              "        if (typeof window.interactive_beam_jquery == 
'undefined') {\n",
+              "          var jqueryScript = 
document.createElement('script');\n",
+              "          jqueryScript.src = 
'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n",
+              "          jqueryScript.type = 'text/javascript';\n",
+              "          jqueryScript.onload = function() {\n",
+              "            var datatableScript = 
document.createElement('script');\n",
+              "            datatableScript.src = 
'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n",
+              "            datatableScript.type = 'text/javascript';\n",
+              "            datatableScript.onload = function() {\n",
+              "              window.interactive_beam_jquery = 
jQuery.noConflict(true);\n",
+              "              
window.interactive_beam_jquery(document).ready(function($){\n",
+              "                \n",
+              "              });\n",
+              "            }\n",
+              "            document.head.appendChild(datatableScript);\n",
+              "          };\n",
+              "          document.head.appendChild(jqueryScript);\n",
+              "        } else {\n",
+              "          
window.interactive_beam_jquery(document).ready(function($){\n",
+              "            \n",
+              "          });\n",
+              "        }"
+            ]
+          },
+          "metadata": {}
+        },
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "Input: What is 5+2?, Output: 5 + 2 = 7\n",
+            "Input: Who is the protagonist of Lord of the Rings?, Output: 
While **Frodo Baggins** is widely considered the main protagonist of *The Lord 
of the Rings*, J.R.R. Tolkien’s epic is an ensemble piece with multiple 
characters occupying the role of \"hero.\" \n",
+            "\n",
+            "Depending on how you analyze the story, there are three primary 
candidates for the protagonist:\n",
+            "\n",
+            "### 1. Frodo Baggins (The Narrative Protagonist)\n",
+            "Frodo is the literal protagonist of the central plot. As the 
Ring-bearer, he is tasked with the ultimate quest: carrying the One Ring to 
Mount Doom to destroy it. \n",
+            "* **Why he is the protagonist:** The main conflict of the entire 
story—the struggle against the corrupting influence of the Ring—takes place 
inside Frodo’s mind and soul. If Frodo fails, Middle-earth falls. He is the 
tragic hero who sacrifices his own peace and well-being so that others may 
live.\n",
+            "\n",
+            "### 2. Samwise Gamgee (The Moral and Thematic Hero)\n",
+            "While Frodo carries the Ring, his loyal servant and friend 
Samwise Gamgee is often considered the emotional heart of the story.\n",
+            "* **Why he is the protagonist:** Tolkien himself famously 
referred to Sam as the **\"chief hero\"** of the epic in one of his letters 
(Letter 131). Sam represents the ordinary, decent person who rises to 
extraordinary heights through sheer love and loyalty. Without Sam, Frodo would 
have died in the wilderness, and the quest would have failed. Sam is also the 
character who gets the final line of the book (\"Well, I'm back\"), signaling 
that he is the survivor who carries the  [...]
+            "\n",
+            "### 3. Aragorn (The Epic/Classic Protagonist)\n",
+            "If *The Lord of the Rings* were a traditional, classical fantasy, 
Aragorn would be the undisputed protagonist. \n",
+            "* **Why he is the protagonist:** He has the classic \"Hero’s 
Journey\" arc—he is the exiled king living in disguise as a Ranger (Strider) 
who must overcome his doubts, claim his birthright, lead the armies of men, and 
defeat the dark lord’s armies. The third volume, *The Return of the King*, is 
named after him. Aragorn dominates the military and political plotlines of the 
story.\n",
+            "\n",
+            "### Summary\n",
+            "* **Frodo** is the **plot protagonist** (the story is about his 
quest).\n",
+            "* **Sam** is the **thematic protagonist** (he represents the 
ultimate virtues of the book: loyalty, hope, and humility).\n",
+            "* **Aragorn** is the **epic protagonist** (he is the traditional 
hero of the physical battles and political struggle).\n",
+            "Input: What is the air-speed velocity of a laden swallow?, 
Output: What do you mean? An African or a European swallow?\n",
+            "\n",
+            "***\n",
+            "\n",
+            "If you happen to be the Bridgekeeper from *Monty Python and the 
Holy Grail*, and you don't want to be cast into the Gorge of Eternal Peril, 
here is the actual breakdown of the math:\n",
+            "\n",
+            "### 1. The European Swallow (*Hirundo rustica*)\n",
+            "In 2003, software updates and avian physics calculations 
estimated the airspeed velocity of an **unladen European swallow** to be 
roughly **11 meters per second** (about **24 miles per hour** or 39 kilometers 
per hour). \n",
+            "\n",
+            "### 2. Can it be \"Laden\"?\n",
+            "A standard European swallow weighs about 20 grams. A standard 
coconut weighs about 1 pound (454 grams). \n",
+            "\n",
+            "According to the laws of aerodynamics, a 20-gram bird cannot 
maintain lift while carrying an object 22 times its own weight. Therefore, a 
European swallow **cannot be laden with a coconut**—at least, not 
individually.\n",
+            "\n",
+            "### 3. The \"Two Swallows\" Theory\n",
+            "As suggested by the castle guards in the film, two swallows could 
theoretically carry a coconut between them using a strand of creeper. \n",
+            "\n",
+            "However, they would have to have it \"simple, under the dorsal 
guiding feather,\" which opens up a whole new set of aerodynamic complications 
regarding synchronization and drag. \n",
+            "\n",
+            "### 4. The African Swallow\n",
+            "While the African swallow is non-migratory (and thus unlikely to 
bring a coconut to England anyway), it is slightly larger. However, it still 
wouldn't be large enough to carry a coconut solo.\n"
+          ]
+        }
+      ],
+      "source": [
+        "model_handler = GeminiModelHandler(\n",
+        "    model_name = 'gemini-3.5-flash',\n",
+        "    request_fn=generate_from_string,\n",
+        "    api_key=GEMINI_API_KEY,\n",
+        ")\n",
+        "\n",
+        "inputs: list[str] = [\n",
+        "    \"What is 5+2?\",\n",
+        "    \"Who is the protagonist of Lord of the Rings?\",\n",
+        "    \"What is the air-speed velocity of a laden swallow?\"\n",
+        "]\n",
+        "\n",
+        "class PostProcessor(beam.DoFn):\n",
+        "  def process(self, element: PredictionResult) -> Iterable[str]:\n",
+        "    for part in element.inference.parts:\n",
+        "      try:\n",
+        "        output_text = part.text\n",
+        "        yield f\"Input: {element.example}, Output: {output_text}\"\n",
+        "      except Exception as e:\n",
+        "        print(f\"Can't decode inference for element: 
{element.example}, got {e}\")\n",
+        "        raise e\n",
+        "\n",
+        "with beam.Pipeline() as p:\n",
+        "    _ = (p | \"Get prompts\" >> beam.Create(inputs)\n",
+        "           | \"Query Gemini\" >> RunInference(model_handler)\n",
+        "           | \"Process Output\" >> beam.ParDo(PostProcessor())\n",
+        "           | \"Print output\" >> beam.Map(print)\n",
+        "        )"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Image Output\n",
+        "This pipeline code takes text as input and produces generated images 
as output from Gemini."
+      ],
+      "metadata": {
+        "id": "k__Aja9FPLaf"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "model_handler = GeminiModelHandler(\n",
+        "    model_name = 'gemini-3.1-flash-image', # Nano Banana 2\n",
+        "    request_fn=generate_from_string,\n",
+        "    api_key=GEMINI_API_KEY,\n",
+        ")\n",
+        "\n",
+        "class PostProcessor(beam.DoFn):\n",
+        "  def process(self, element: PredictionResult) -> 
Iterable[Image.Image]:\n",
+        "    try:\n",
+        "      response = element.inference\n",
+        "      for part in response.parts:\n",
+        "        if part.text is not None:\n",
+        "          print(part.text)\n",
+        "        elif part.inline_data is not None:\n",
+        "          image = Image.open(BytesIO(part.inline_data.data))\n",
+        "          yield image\n",
+        "    except Exception as e:\n",
+        "      print(f\"Can't decode inference for element: {element.example}, 
got {e}\")\n",
+        "      raise e\n",
+        "\n",
+        "\n",
+        "class ImageSink(FileSink):\n",
+        "  def open(self, fh) -> None:\n",
+        "    self._fh = fh\n",
+        "\n",
+        "  def write(self, record):\n",
+        "    record.save(self._fh, format='PNG')\n",
+        "\n",
+        "  def flush(self):\n",
+        "    self._fh.flush()\n",
+        "\n",
+        "\n",
+        "inputs: list[str] = [\n",
+        "    \"Create a picture of a pineapple in the sand on the beach.\",\n",
+        "]\n",
+        "\n",
+        "with beam.Pipeline() as p:\n",
+        "    output = (p | \"Get prompts\" >> beam.Create(inputs)\n",
+        "           | \"Query Gemini\" >> RunInference(model_handler)\n",
+        "           | \"Process Output\" >> beam.ParDo(PostProcessor())\n",
+        "           | \"WriteOutput\" >> WriteToFiles(\n",
+        "                                path='tmp/',\n",
+        "                                
file_naming=default_file_naming(\"gemini-image\", \".png\"),\n",
+        "                                sink=ImageSink())\n",
+        "        )\n",
+        "    _ = output | \"Print output\" >> beam.Map(print)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/";
+        },
+        "id": "_C-VajNGOVYs",
+        "outputId": "f44dde53-117b-4d6a-85da-dafdca7839fb"
+      },
+      "execution_count": 4,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "FileResult(file_name='gemini-image-00000-of-00001.png', 
shard_index=0, total_shards=1, window=GlobalWindow, pane=None, 
destination=None)\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Image outputs generally need to be saved somewhere before being 
viewable, so we will load the file created by the pipeline to render it. 
Because we only generated one file and specified a naming convention, we know 
the static file path to our image."
+      ],
+      "metadata": {
+        "id": "RBACanDTSrtB"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "with Image.open(\"tmp/gemini-image-00000-of-00001.png\") as image:\n",
+        "  display(image)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/";,
+          "height": 785
+        },
+        "id": "jsU10BL_QvNL",
+        "outputId": "be2247aa-96cf-4625-f14c-c668a40734c5"
+      },
+      "execution_count": 5,
+      "outputs": [
+        {
+          "output_type": "display_data",
+          "data": {
+            "text/plain": [
+              "<PIL.PngImagePlugin.PngImageFile image mode=RGB size=1408x768>"
+            ],
+            "image/png": 
"iVBORw0KGgoAAAANSUhEUgAABYAAAAMACAIAAAASU1SbAAEAAElEQVR4AVTd2Y4lS5+n5ZgjMvdXVd33fwuccAQHSEgtEFJLDKIRk+AOEIiub++MOXjen63IKnznXuHL3Ow/T2Zu7uv63/0P/9v7+/vt9d319fXV5/vn9dXX5/XX3c311f3VlbOvm6uv66+PD59X9/7X7etLy+dVx41DCwg3d7faP9/frm9rfP/6vP0C4er149Xn3c3N3d3dx1vnt7f1/Pj69On8ANT+/nn1+fl5f3sbtAP2sz7hdvXj2t+v9w8Uwvv69nZ9dw/X+9vX6+eHq88fb58f1y8f74Bof7y9Cf6V3l93V9cPD3e3Ef5xd3N7c3v9eIO2r7vrN9xcabi9vfm4jpbIw5GeXw/XD8hAFLk83N7hDVjA7+5uvm6uPz4+UHX7dfPx9fX8/vX+9fXn8+dfb28I/Ov96/
 [...]
+            "image/jpeg": 
"/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAMABYADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKzt
 [...]
+          },
+          "metadata": {}
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "### Use a custom request function\n",
+        "Writing a custom request function to send arbitrary input to a model 
is relatively straightforward, as Gemini's API shares commonalities across each 
model type. Let's try a text-to-speech model."
+      ],
+      "metadata": {
+        "id": "fGHwV5z_Umi6"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "tts_config = types.GenerateContentConfig(\n",
+        "      response_modalities=[\"AUDIO\"],\n",
+        "      speech_config=types.SpeechConfig(\n",
+        "         voice_config=types.VoiceConfig(\n",
+        "            prebuilt_voice_config=types.PrebuiltVoiceConfig(\n",
+        "               voice_name='Charon',\n",
+        "            )\n",
+        "         )\n",
+        "      ),\n",
+        "   )\n",
+        "\n",
+        "def generate_tts_from_string(\n",
+        "    model_name: str,\n",
+        "    batch: Sequence[str],\n",
+        "    model: genai.Client,\n",
+        "    inference_args: dict[str, any]\n",
+        "):\n",
+        "  return model.models.generate_content(model=model_name,\n",
+        "                                       contents=cast(Any, batch),\n",
+        "                                       config = tts_config)\n",
+        "\n",
+        "model_handler = GeminiModelHandler(\n",
+        "    model_name = 'gemini-3.1-flash-tts-preview',\n",
+        "    request_fn=generate_tts_from_string,\n",
+        "    api_key=GEMINI_API_KEY,\n",
+        ")\n",
+        "\n",
+        "class PostProcessor(beam.DoFn):\n",
+        "  def process(self, element: PredictionResult) -> Iterable[Any]:\n",
+        "    try:\n",
+        "      response = element.inference\n",
+        "      for part in response.parts:\n",
+        "        if part.text is not None:\n",
+        "          print(part.text)\n",
+        "        elif part.inline_data is not None:\n",
+        "          yield part.inline_data\n",
+        "    except Exception as e:\n",
+        "      print(f\"Can't decode inference for element: {element.example}, 
got {e}\")\n",
+        "      raise e\n",
+        "\n",
+        "\n",
+        "class AudioSink(FileSink):\n",
+        "  def open(self, fh) -> None:\n",
+        "    self._fh = fh\n",
+        "\n",
+        "  def write(self, record):\n",
+        "    with wave.open(self._fh, 'wb') as f:\n",
+        "      f.setnchannels(1)\n",
+        "      f.setsampwidth(2)\n",
+        "      f.setframerate(16000)\n",
+        "      f.writeframes(record.data)\n",
+        "\n",
+        "  def flush(self):\n",
+        "    self._fh.flush()\n",
+        "\n",
+        "inputs: list[str] = [\n",
+        "    \"Say 'Hello, World!'\",\n",
+        "]\n",
+        "\n",
+        "with beam.Pipeline() as p:\n",
+        "    output = (p | \"Get prompts\" >> beam.Create(inputs)\n",
+        "           | \"Query Gemini\" >> RunInference(model_handler)\n",
+        "           | \"Process Output\" >> beam.ParDo(PostProcessor())\n",
+        "           | \"WriteOutput\" >> WriteToFiles(\n",
+        "                                path='tmp/',\n",
+        "                                
file_naming=default_file_naming(\"gemini-audio\", \".wav\"),\n",
+        "                                sink=AudioSink())\n",
+        "        )\n",
+        "    _ = output | \"Print output\" >> beam.Map(print)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/";
+        },
+        "id": "1PGdvF5FVEmI",
+        "outputId": "8c48f6e0-a68d-4f95-dc9f-816984f7d314"
+      },
+      "execution_count": 6,
+      "outputs": [
+        {
+          "output_type": "stream",
+          "name": "stdout",
+          "text": [
+            "FileResult(file_name='gemini-audio-00000-of-00001.wav', 
shard_index=0, total_shards=1, window=GlobalWindow, pane=None, 
destination=None)\n"
+          ]
+        }
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Similar to the image output, we can load the audio output once the 
pipeline is complete."
+      ],
+      "metadata": {
+        "id": "mvvImVFLdOts"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "from IPython.display import Audio\n",
+        "Audio(filename='tmp/gemini-audio-00000-of-00001.wav', autoplay=False)"
+      ],
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/";,
+          "height": 75
+        },
+        "id": "yOMW_1mZbd-o",
+        "outputId": "9ad91686-3d01-4512-9d88-c09338530e81"
+      },
+      "execution_count": 7,
+      "outputs": [
+        {
+          "output_type": "execute_result",
+          "data": {
+            "text/plain": [
+              "<IPython.lib.display.Audio object>"
+            ],
+            "text/html": [
+              "\n",
+              "                <audio  controls=\"controls\" >\n",
+              "                    <source 
src=\"data:audio/x-wav;base64,UklGRiQOAQBXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0YQAOAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAP////////////////////////////////////8AAAAAAAA
 [...]
+              "                    Your browser does not support the audio 
element.\n",
+              "                </audio>\n",
+              "              "
+            ]
+          },
+          "metadata": {},
+          "execution_count": 7
+        }
+      ]
+    }
+  ],
+  "metadata": {
+    "colab": {
+      "provenance": [],
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}
\ No newline at end of file

(beam) branch master updated: Add Gemini RunInference example notebook (#38943)

Reply via email to