This is an automated email from the ASF dual-hosted git repository.
jrmccluskey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push:
new 7d642a024a4 Add Gemini RunInference example notebook (#38943)
7d642a024a4 is described below
commit 7d642a024a4e9164f72eaf642f95e3725e4d8f7f
Author: Jack McCluskey <[email protected]>
AuthorDate: Fri Jun 12 11:03:21 2026 -0400
Add Gemini RunInference example notebook (#38943)
---
.../notebooks/beam-ml/run_inference_gemini.ipynb | 609 +++++++++++++++++++++
1 file changed, 609 insertions(+)
diff --git a/examples/notebooks/beam-ml/run_inference_gemini.ipynb
b/examples/notebooks/beam-ml/run_inference_gemini.ipynb
new file mode 100644
index 00000000000..2a9731a56a3
--- /dev/null
+++ b/examples/notebooks/beam-ml/run_inference_gemini.ipynb
@@ -0,0 +1,609 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ "<a
href=\"https://colab.research.google.com/github/jrmccluskey/beam/blob/geminiNotebook/examples/notebooks/beam-ml/run_inference_gemini.ipynb\"
target=\"_parent\"><img
src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In
Colab\"/></a>"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "id": "fFjof1NgAJwu"
+ },
+ "outputs": [],
+ "source": [
+ "# @title ###### Licensed to the Apache Software Foundation (ASF),
Version 2.0 (the \"License\")\n",
+ "\n",
+ "# Licensed to the Apache Software Foundation (ASF) under one\n",
+ "# or more contributor license agreements. See the NOTICE file\n",
+ "# distributed with this work for additional information\n",
+ "# regarding copyright ownership. The ASF licenses this file\n",
+ "# to you under the Apache License, Version 2.0 (the\n",
+ "# \"License\"); you may not use this file except in compliance\n",
+ "# with the License. You may obtain a copy of the License at\n",
+ "#\n",
+ "# http://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing,\n",
+ "# software distributed under the License is distributed on an\n",
+ "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
+ "# KIND, either express or implied. See the License for the\n",
+ "# specific language governing permissions and limitations\n",
+ "# under the License"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "A8xNRyZMW1yK"
+ },
+ "source": [
+ "# Apache Beam RunInference with Gemini\n",
+ "\n",
+ "<table align=\"left\">\n",
+ " <td>\n",
+ " <a target=\"_blank\"
href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_gemini.ipynb\"><img
src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\"
/>Run in Google Colab</a>\n",
+ " </td>\n",
+ " <td>\n",
+ " <a target=\"_blank\"
href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_inference_gemini.ipynb\"><img
src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\"
/>View source on GitHub</a>\n",
+ " </td>\n",
+ "</table>\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HrCtxslBGK8Z"
+ },
+ "source": [
+ "This notebook shows how to use the Apache Beam
[RunInference](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.RunInference)
transform for image classification with
[Gemini](https://ai.google.dev/gemini-api/docs).\n",
+ "Apache Beam has built-in support for sending requests to a remotely
deployed Gemini model by using the
[`GeminiModelHandler`](https://github.com/apache/beam/blob/5307c7138af8b5daba7a5495aba87d53ae9b0ae7/sdks/python/apache_beam/ml/inference/gemini_inference.py#L103)
class. This class allows for custom input functions, enabling the use of any
Gemini model accessible through the Gemini API; this allows for custom handling
of any input type and configuration information necessary.\n",
+ "\n",
+ "When you use remote inference with Vertex AI, consider the following
factors:\n",
+ "1. Sending requests to Gemini incurs cost from Google Cloud. Consider
using smaller, less expensive models for experimentation and testing code.\n",
+ "\n",
+ "This notebook demonstrates the following steps:\n",
+ "- Configure a request function for the desired Gemini model\n",
+ "- Set up example inputs\n",
+ "- Run those examples with the built-in GeminiModelHandler and get a
prediction inside an Apache Beam pipeline\n",
+ "- Use custom file sinks to save multimodal outputs from image and
audio models\n",
+ "\n",
+ "For more information about using RunInference, see [Get started with
AI/ML pipelines](https://beam.apache.org/documentation/ml/overview/) in the
Apache Beam documentation."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "gVCtGOKTHMm4"
+ },
+ "source": [
+ "## Before you begin\n",
+ "Set up your environment and download dependencies."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YDHPlMjZRuY0"
+ },
+ "source": [
+ "### Install Apache Beam\n",
+ "To use RunInference with the built-in Gemini model handler, install
the Apache Beam SDK version 2.66.0 or later."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "jBakpNZnAhqk"
+ },
+ "outputs": [],
+ "source": [
+ "!pip install apache_beam[interactive,gcp]==2.74.0 --quiet\n",
+ "\n",
+ "# To use the newly installed versions, restart the runtime.\n",
+ "exit()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "X80jy3FqHjK4"
+ },
+ "source": [
+ "### Create a Gemini API Key\n",
+ "This notebook relies on the Gemini API, which requires an existing
API key to use. Follow the instructions in the [Gemini API
Quickstart](https://ai.google.dev/gemini-api/docs/quickstart#before_you_begin)
to generate an API key for your account.\n",
+ "\n",
+ "To run the following cell, your API key must be stored it in a Colab
Secret named `GEMINI_API_KEY`.\n",
+ "\n",
+ "1. Open your Google Colab notebook and click on the 🔑 **Secrets** tab
in the left panel.\n",
+ " \n",
+ " <img
src=\"https://storage.googleapis.com/generativeai-downloads/images/secrets.jpg\"
alt=\"You can find the Secrets tab on the left panel.\" width=50%>\n",
+ "\n",
+ "2. Create a new secret with the name `GEMINI_API_KEY`.\n",
+ "3. Copy and paste your API key into the `Value` input box of
`GEMINI_API_KEY`.\n",
+ "4. Toggle the button on the left to allow all notebooks access to the
secret."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {
+ "id": "Kz9sccyGBqz3"
+ },
+ "outputs": [],
+ "source": [
+ "from google.colab import userdata\n",
+ "\n",
+ "GEMINI_API_KEY=userdata.get('GEMINI_API_KEY')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Import dependencies"
+ ],
+ "metadata": {
+ "id": "wwEvxB2_JO3h"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import os\n",
+ "import wave\n",
+ "\n",
+ "from collections.abc import Iterable\n",
+ "from collections.abc import Sequence\n",
+ "from io import BytesIO\n",
+ "from typing import Any\n",
+ "from typing import cast\n",
+ "\n",
+ "import apache_beam as beam\n",
+ "from apache_beam.io.fileio import FileSink\n",
+ "from apache_beam.io.fileio import WriteToFiles\n",
+ "from apache_beam.io.fileio import default_file_naming\n",
+ "from apache_beam.ml.inference.base import PredictionResult\n",
+ "from apache_beam.ml.inference.base import RunInference\n",
+ "from apache_beam.ml.inference.gemini_inference import
GeminiModelHandler\n",
+ "from apache_beam.ml.inference.gemini_inference import
generate_from_string\n",
+ "from apache_beam.ml.inference.gemini_inference import
generate_image_from_strings_and_images\n",
+ "from apache_beam.options.pipeline_options import PipelineOptions\n",
+ "\n",
+ "from google import genai\n",
+ "from google.genai import types\n",
+ "from PIL import Image"
+ ],
+ "metadata": {
+ "id": "muTP7ywXJVea"
+ },
+ "execution_count": 2,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "0a1zerXycQ0z"
+ },
+ "source": [
+ "## Run the pipeline"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Text Output\n",
+ "This pipeline code takes text as input and receives text as output
from Gemini."
+ ],
+ "metadata": {
+ "id": "ERSXxZS-OwBc"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 766
+ },
+ "id": "St07XoibcQSb",
+ "outputId": "1ba4ef99-fcbb-441e-f991-bd370ff6f6c1"
+ },
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "application/javascript": [
+ "\n",
+ " if (typeof window.interactive_beam_jquery ==
'undefined') {\n",
+ " var jqueryScript =
document.createElement('script');\n",
+ " jqueryScript.src =
'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n",
+ " jqueryScript.type = 'text/javascript';\n",
+ " jqueryScript.onload = function() {\n",
+ " var datatableScript =
document.createElement('script');\n",
+ " datatableScript.src =
'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n",
+ " datatableScript.type = 'text/javascript';\n",
+ " datatableScript.onload = function() {\n",
+ " window.interactive_beam_jquery =
jQuery.noConflict(true);\n",
+ "
window.interactive_beam_jquery(document).ready(function($){\n",
+ " \n",
+ " });\n",
+ " }\n",
+ " document.head.appendChild(datatableScript);\n",
+ " };\n",
+ " document.head.appendChild(jqueryScript);\n",
+ " } else {\n",
+ "
window.interactive_beam_jquery(document).ready(function($){\n",
+ " \n",
+ " });\n",
+ " }"
+ ]
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Input: What is 5+2?, Output: 5 + 2 = 7\n",
+ "Input: Who is the protagonist of Lord of the Rings?, Output:
While **Frodo Baggins** is widely considered the main protagonist of *The Lord
of the Rings*, J.R.R. Tolkien’s epic is an ensemble piece with multiple
characters occupying the role of \"hero.\" \n",
+ "\n",
+ "Depending on how you analyze the story, there are three primary
candidates for the protagonist:\n",
+ "\n",
+ "### 1. Frodo Baggins (The Narrative Protagonist)\n",
+ "Frodo is the literal protagonist of the central plot. As the
Ring-bearer, he is tasked with the ultimate quest: carrying the One Ring to
Mount Doom to destroy it. \n",
+ "* **Why he is the protagonist:** The main conflict of the entire
story—the struggle against the corrupting influence of the Ring—takes place
inside Frodo’s mind and soul. If Frodo fails, Middle-earth falls. He is the
tragic hero who sacrifices his own peace and well-being so that others may
live.\n",
+ "\n",
+ "### 2. Samwise Gamgee (The Moral and Thematic Hero)\n",
+ "While Frodo carries the Ring, his loyal servant and friend
Samwise Gamgee is often considered the emotional heart of the story.\n",
+ "* **Why he is the protagonist:** Tolkien himself famously
referred to Sam as the **\"chief hero\"** of the epic in one of his letters
(Letter 131). Sam represents the ordinary, decent person who rises to
extraordinary heights through sheer love and loyalty. Without Sam, Frodo would
have died in the wilderness, and the quest would have failed. Sam is also the
character who gets the final line of the book (\"Well, I'm back\"), signaling
that he is the survivor who carries the [...]
+ "\n",
+ "### 3. Aragorn (The Epic/Classic Protagonist)\n",
+ "If *The Lord of the Rings* were a traditional, classical fantasy,
Aragorn would be the undisputed protagonist. \n",
+ "* **Why he is the protagonist:** He has the classic \"Hero’s
Journey\" arc—he is the exiled king living in disguise as a Ranger (Strider)
who must overcome his doubts, claim his birthright, lead the armies of men, and
defeat the dark lord’s armies. The third volume, *The Return of the King*, is
named after him. Aragorn dominates the military and political plotlines of the
story.\n",
+ "\n",
+ "### Summary\n",
+ "* **Frodo** is the **plot protagonist** (the story is about his
quest).\n",
+ "* **Sam** is the **thematic protagonist** (he represents the
ultimate virtues of the book: loyalty, hope, and humility).\n",
+ "* **Aragorn** is the **epic protagonist** (he is the traditional
hero of the physical battles and political struggle).\n",
+ "Input: What is the air-speed velocity of a laden swallow?,
Output: What do you mean? An African or a European swallow?\n",
+ "\n",
+ "***\n",
+ "\n",
+ "If you happen to be the Bridgekeeper from *Monty Python and the
Holy Grail*, and you don't want to be cast into the Gorge of Eternal Peril,
here is the actual breakdown of the math:\n",
+ "\n",
+ "### 1. The European Swallow (*Hirundo rustica*)\n",
+ "In 2003, software updates and avian physics calculations
estimated the airspeed velocity of an **unladen European swallow** to be
roughly **11 meters per second** (about **24 miles per hour** or 39 kilometers
per hour). \n",
+ "\n",
+ "### 2. Can it be \"Laden\"?\n",
+ "A standard European swallow weighs about 20 grams. A standard
coconut weighs about 1 pound (454 grams). \n",
+ "\n",
+ "According to the laws of aerodynamics, a 20-gram bird cannot
maintain lift while carrying an object 22 times its own weight. Therefore, a
European swallow **cannot be laden with a coconut**—at least, not
individually.\n",
+ "\n",
+ "### 3. The \"Two Swallows\" Theory\n",
+ "As suggested by the castle guards in the film, two swallows could
theoretically carry a coconut between them using a strand of creeper. \n",
+ "\n",
+ "However, they would have to have it \"simple, under the dorsal
guiding feather,\" which opens up a whole new set of aerodynamic complications
regarding synchronization and drag. \n",
+ "\n",
+ "### 4. The African Swallow\n",
+ "While the African swallow is non-migratory (and thus unlikely to
bring a coconut to England anyway), it is slightly larger. However, it still
wouldn't be large enough to carry a coconut solo.\n"
+ ]
+ }
+ ],
+ "source": [
+ "model_handler = GeminiModelHandler(\n",
+ " model_name = 'gemini-3.5-flash',\n",
+ " request_fn=generate_from_string,\n",
+ " api_key=GEMINI_API_KEY,\n",
+ ")\n",
+ "\n",
+ "inputs: list[str] = [\n",
+ " \"What is 5+2?\",\n",
+ " \"Who is the protagonist of Lord of the Rings?\",\n",
+ " \"What is the air-speed velocity of a laden swallow?\"\n",
+ "]\n",
+ "\n",
+ "class PostProcessor(beam.DoFn):\n",
+ " def process(self, element: PredictionResult) -> Iterable[str]:\n",
+ " for part in element.inference.parts:\n",
+ " try:\n",
+ " output_text = part.text\n",
+ " yield f\"Input: {element.example}, Output: {output_text}\"\n",
+ " except Exception as e:\n",
+ " print(f\"Can't decode inference for element:
{element.example}, got {e}\")\n",
+ " raise e\n",
+ "\n",
+ "with beam.Pipeline() as p:\n",
+ " _ = (p | \"Get prompts\" >> beam.Create(inputs)\n",
+ " | \"Query Gemini\" >> RunInference(model_handler)\n",
+ " | \"Process Output\" >> beam.ParDo(PostProcessor())\n",
+ " | \"Print output\" >> beam.Map(print)\n",
+ " )"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Image Output\n",
+ "This pipeline code takes text as input and produces generated images
as output from Gemini."
+ ],
+ "metadata": {
+ "id": "k__Aja9FPLaf"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "model_handler = GeminiModelHandler(\n",
+ " model_name = 'gemini-3.1-flash-image', # Nano Banana 2\n",
+ " request_fn=generate_from_string,\n",
+ " api_key=GEMINI_API_KEY,\n",
+ ")\n",
+ "\n",
+ "class PostProcessor(beam.DoFn):\n",
+ " def process(self, element: PredictionResult) ->
Iterable[Image.Image]:\n",
+ " try:\n",
+ " response = element.inference\n",
+ " for part in response.parts:\n",
+ " if part.text is not None:\n",
+ " print(part.text)\n",
+ " elif part.inline_data is not None:\n",
+ " image = Image.open(BytesIO(part.inline_data.data))\n",
+ " yield image\n",
+ " except Exception as e:\n",
+ " print(f\"Can't decode inference for element: {element.example},
got {e}\")\n",
+ " raise e\n",
+ "\n",
+ "\n",
+ "class ImageSink(FileSink):\n",
+ " def open(self, fh) -> None:\n",
+ " self._fh = fh\n",
+ "\n",
+ " def write(self, record):\n",
+ " record.save(self._fh, format='PNG')\n",
+ "\n",
+ " def flush(self):\n",
+ " self._fh.flush()\n",
+ "\n",
+ "\n",
+ "inputs: list[str] = [\n",
+ " \"Create a picture of a pineapple in the sand on the beach.\",\n",
+ "]\n",
+ "\n",
+ "with beam.Pipeline() as p:\n",
+ " output = (p | \"Get prompts\" >> beam.Create(inputs)\n",
+ " | \"Query Gemini\" >> RunInference(model_handler)\n",
+ " | \"Process Output\" >> beam.ParDo(PostProcessor())\n",
+ " | \"WriteOutput\" >> WriteToFiles(\n",
+ " path='tmp/',\n",
+ "
file_naming=default_file_naming(\"gemini-image\", \".png\"),\n",
+ " sink=ImageSink())\n",
+ " )\n",
+ " _ = output | \"Print output\" >> beam.Map(print)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "_C-VajNGOVYs",
+ "outputId": "f44dde53-117b-4d6a-85da-dafdca7839fb"
+ },
+ "execution_count": 4,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "FileResult(file_name='gemini-image-00000-of-00001.png',
shard_index=0, total_shards=1, window=GlobalWindow, pane=None,
destination=None)\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Image outputs generally need to be saved somewhere before being
viewable, so we will load the file created by the pipeline to render it.
Because we only generated one file and specified a naming convention, we know
the static file path to our image."
+ ],
+ "metadata": {
+ "id": "RBACanDTSrtB"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "with Image.open(\"tmp/gemini-image-00000-of-00001.png\") as image:\n",
+ " display(image)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 785
+ },
+ "id": "jsU10BL_QvNL",
+ "outputId": "be2247aa-96cf-4625-f14c-c668a40734c5"
+ },
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "<PIL.PngImagePlugin.PngImageFile image mode=RGB size=1408x768>"
+ ],
+ "image/png":
"iVBORw0KGgoAAAANSUhEUgAABYAAAAMACAIAAAASU1SbAAEAAElEQVR4AVTd2Y4lS5+n5ZgjMvdXVd33fwuccAQHSEgtEFJLDKIRk+AOEIiub++MOXjen63IKnznXuHL3Ow/T2Zu7uv63/0P/9v7+/vt9d319fXV5/vn9dXX5/XX3c311f3VlbOvm6uv66+PD59X9/7X7etLy+dVx41DCwg3d7faP9/frm9rfP/6vP0C4er149Xn3c3N3d3dx1vnt7f1/Pj69On8ANT+/nn1+fl5f3sbtAP2sz7hdvXj2t+v9w8Uwvv69nZ9dw/X+9vX6+eHq88fb58f1y8f74Bof7y9Cf6V3l93V9cPD3e3Ef5xd3N7c3v9eIO2r7vrN9xcabi9vfm4jpbIw5GeXw/XD8hAFLk83N7hDVjA7+5uvm6uPz4+UHX7dfPx9fX8/vX+9fXn8+dfb28I/Ov96/
[...]
+ "image/jpeg":
"/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAMABYADASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKzt
[...]
+ },
+ "metadata": {}
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### Use a custom request function\n",
+ "Writing a custom request function to send arbitrary input to a model
is relatively straightforward, as Gemini's API shares commonalities across each
model type. Let's try a text-to-speech model."
+ ],
+ "metadata": {
+ "id": "fGHwV5z_Umi6"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "tts_config = types.GenerateContentConfig(\n",
+ " response_modalities=[\"AUDIO\"],\n",
+ " speech_config=types.SpeechConfig(\n",
+ " voice_config=types.VoiceConfig(\n",
+ " prebuilt_voice_config=types.PrebuiltVoiceConfig(\n",
+ " voice_name='Charon',\n",
+ " )\n",
+ " )\n",
+ " ),\n",
+ " )\n",
+ "\n",
+ "def generate_tts_from_string(\n",
+ " model_name: str,\n",
+ " batch: Sequence[str],\n",
+ " model: genai.Client,\n",
+ " inference_args: dict[str, any]\n",
+ "):\n",
+ " return model.models.generate_content(model=model_name,\n",
+ " contents=cast(Any, batch),\n",
+ " config = tts_config)\n",
+ "\n",
+ "model_handler = GeminiModelHandler(\n",
+ " model_name = 'gemini-3.1-flash-tts-preview',\n",
+ " request_fn=generate_tts_from_string,\n",
+ " api_key=GEMINI_API_KEY,\n",
+ ")\n",
+ "\n",
+ "class PostProcessor(beam.DoFn):\n",
+ " def process(self, element: PredictionResult) -> Iterable[Any]:\n",
+ " try:\n",
+ " response = element.inference\n",
+ " for part in response.parts:\n",
+ " if part.text is not None:\n",
+ " print(part.text)\n",
+ " elif part.inline_data is not None:\n",
+ " yield part.inline_data\n",
+ " except Exception as e:\n",
+ " print(f\"Can't decode inference for element: {element.example},
got {e}\")\n",
+ " raise e\n",
+ "\n",
+ "\n",
+ "class AudioSink(FileSink):\n",
+ " def open(self, fh) -> None:\n",
+ " self._fh = fh\n",
+ "\n",
+ " def write(self, record):\n",
+ " with wave.open(self._fh, 'wb') as f:\n",
+ " f.setnchannels(1)\n",
+ " f.setsampwidth(2)\n",
+ " f.setframerate(16000)\n",
+ " f.writeframes(record.data)\n",
+ "\n",
+ " def flush(self):\n",
+ " self._fh.flush()\n",
+ "\n",
+ "inputs: list[str] = [\n",
+ " \"Say 'Hello, World!'\",\n",
+ "]\n",
+ "\n",
+ "with beam.Pipeline() as p:\n",
+ " output = (p | \"Get prompts\" >> beam.Create(inputs)\n",
+ " | \"Query Gemini\" >> RunInference(model_handler)\n",
+ " | \"Process Output\" >> beam.ParDo(PostProcessor())\n",
+ " | \"WriteOutput\" >> WriteToFiles(\n",
+ " path='tmp/',\n",
+ "
file_naming=default_file_naming(\"gemini-audio\", \".wav\"),\n",
+ " sink=AudioSink())\n",
+ " )\n",
+ " _ = output | \"Print output\" >> beam.Map(print)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "1PGdvF5FVEmI",
+ "outputId": "8c48f6e0-a68d-4f95-dc9f-816984f7d314"
+ },
+ "execution_count": 6,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "FileResult(file_name='gemini-audio-00000-of-00001.wav',
shard_index=0, total_shards=1, window=GlobalWindow, pane=None,
destination=None)\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Similar to the image output, we can load the audio output once the
pipeline is complete."
+ ],
+ "metadata": {
+ "id": "mvvImVFLdOts"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "from IPython.display import Audio\n",
+ "Audio(filename='tmp/gemini-audio-00000-of-00001.wav', autoplay=False)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 75
+ },
+ "id": "yOMW_1mZbd-o",
+ "outputId": "9ad91686-3d01-4512-9d88-c09338530e81"
+ },
+ "execution_count": 7,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "<IPython.lib.display.Audio object>"
+ ],
+ "text/html": [
+ "\n",
+ " <audio controls=\"controls\" >\n",
+ " <source
src=\"data:audio/x-wav;base64,UklGRiQOAQBXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0YQAOAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAAAAAP////////////////////////////////////8AAAAAAAA
[...]
+ " Your browser does not support the audio
element.\n",
+ " </audio>\n",
+ " "
+ ]
+ },
+ "metadata": {},
+ "execution_count": 7
+ }
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file