Contact emails

[email protected], [email protected], [email protected],
[email protected], [email protected], [email protected]

Explainer

https://github.com/webmachinelearning/prompt-api/blob/main/README.md

Specification

http://webmachinelearning.github.io/prompt-api/

Summary

An API designed for interacting with an AI language model using text,
image, and audio inputs. It supports various use cases, from generating
image captions and performing visual searches to transcribing audio,
classifying sound events, generating text following specific instructions,
and extracting information or insights from text. It supports structured
outputs which ensure that responses adhere to a predefined format,
typically expressed as a JSON schema, to enhance response conformance and
facilitate seamless integration with downstream applications that require
standardized output formats. This API is also exposed in Chrome Extensions.
This feature entry tracks the exposure on the web. An enterprise policy
(GenAILocalFoundationalModelSettings) is available to disable the
underlying model downloading which would render this API unavailable.
Language support log: - Chrome M139 and earlier only supported 'en' -
Chrome M140 added support for 'es' and 'ja'

Blink component

Blink > AI > Prompt
<https://issues.chromium.org/issues?q=customfield1222907:%22Blink%20%3E%20AI%20%3E%20Prompt%22>

Web Feature ID

https://github.com/web-platform-dx/web-features/issues/3530

TAG review

https://github.com/w3ctag/design-reviews/issues/1093

TAG review status

Issues Open

Origin Trial Name

Prompt API

Chromium Trial Name

AIPromptAPIMultimodalInput

Origin Trial documentation link

https://github.com/webmachinelearning/prompt-api/blob/main/README.md

WebFeature UseCounter name

LanguageModel_Create

Risks

Interoperability and Compatibility

This feature, like all built-in AI features, has inherent interoperability
risks due to the use of AI models whose behavior is not fully specified.
See some general discussion in
https://www.w3.org/reports/ai-web-impact/#interop. In particular, because
the output in response to a given prompt varies by language model, it is
possible for developers to write brittle code that relies on specific
output formats or quality, and does not work across multiple browsers or
multiple versions of the same browser. There are some reasons to be
optimistic that web developers won't write such brittle code. Language
models are inherently nondeterministic, so creating dependencies on their
exact output is difficult. And many users will not have the hardware
necessary to run a language model, so developers will need to code in a way
such that the prompt API is always used as an enhancement, or has
appropriate fallback to cloud services. Several parts of the API design
help steer developers in the right direction, as well. The API has clear
availability testing features for developers to use, and requires
developers to state their required capabilities (e.g., modalities and
languages) up front. Most importantly, the structured outputs feature can
help mitigate against writing brittle code that relies on specific output
formats.

Gecko: Negative (https://github.com/mozilla/standards-positions/issues/1213)

WebKit: No signal (https://github.com/WebKit/standards-positions/issues/495)

Web developers: Strongly positive (
https://github.com/webmachinelearning/prompt-api/blob/main/README.md#stakeholder-feedback
)

Other signals: We are also working with Microsoft Edge developers on this
feature, with them contributing the structured output functionality.

Activation

This feature would definitely benefit from having polyfills, backed by any
of: cloud services, lazily-loaded client-side models using WebGPU, or the
web developer's own server. We anticipate seeing an ecosystem of such
polyfills grow as more developers experiment with this API.

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that
it has potentially high risk for Android WebView-based applications?

Not Applicable; this API is not available in WebView.


Goals for experimentation

No information provided

Reason this experiment is being extended

We are requesting an extension for the Prompt API trial. This is primarily
to allow us to: 1) Gather more feedback from developers. 2) Address
critical bugs related to quality and language support. 3) Help finalize the
API design, considering the impact of features like function calling. More
time will help us deliver a more robust API.

Ongoing technical constraints

No information provided

Debuggability

It is possible that giving DevTools more insight into the nondeterministic
states of the model, e.g. random seeds, could help with debugging. See
discussion at https://github.com/webmachinelearning/prompt-api/issues/74.
We also have some internal debugging pages which give more detail on the
model's status, e.g. chrome://on-device-internals, and parts of these might
be suitable to port into DevTools.

Will this feature be supported on all six Blink platforms (Windows, Mac,
Linux, ChromeOS, Android, and Android WebView)?

No

Not all platforms will come with a language model. In particular, in the
initial stages we are focusing on Windows, Mac, Linux and ChromeOS.

Is this feature fully tested by web-platform-tests
<https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_platform_tests.md>
?

No

We plan to write web platform tests for the API surface as much as
possible. The core responses from the model will be difficult to test, but
some facets are testable, e.g. the adherence to structured output response
constraints.

Flag name on about://flags

prompt-api-for-gemini-nano-multimodal-input

Finch feature name

AIPromptAPIMultimodalInput

Requires code in //chrome?

True

Tracking bug

https://issues.chromium.org/issues/417530643

Launch bug

https://launch.corp.google.com/launch/4395635

Measurement

We have various use counters for the API, e.g. LanguageModel_Create

Non-OSS dependencies

Does the feature depend on any code or APIs outside the Chromium open
source repository and its open-source dependencies to function?

Yes: this feature depends on a language model, which is bridged to the
open-source parts of the implementation via the interfaces in
//services/on_device_model.

Estimated milestones

Origin trial desktop first

139

Origin trial desktop last

144

Origin trial extension 1 end milestone

147

DevTrial on desktop

137

Anticipated spec changes

Open questions about a feature may be a source of future web compat or
interop issues. Please list open issues (e.g. links to known github issues
in the project for the feature specification) whose resolution may
introduce web compat/interop risk (e.g., changing to naming or structure of
the API in a non-backward-compatible way).

https://github.com/webmachinelearning/prompt-api/issues/42 is somewhat
worth keeping an eye on, but we believe a forward-compatible approach is
possible by just providing constant min = max values.

Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/5134603979063296?gate=5151092893679616

Links to previous Intent discussions

Intent to Prototype:
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAM0wra_LXU8KkcVJ0x%3DzYa4h_sC3FaHGdaoM59FNwwtRAsOALQ%40mail.gmail.com

Intent to Experiment:
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAM0wra9oT0jygAYT00WPp0_wtZ-znrB2OdZ6GQb%2B3thFLP19pA%40mail.gmail.com

This intent message was generated by Chrome Platform Status
<https://chromestatus.com/>.

-- 
You received this message because you are subscribed to the Google Groups 
"blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CAJcT_ZhyheBntZHMEwFJA%3DuhpkWmDx8yFieL5E5g%2Bwp5UA0mzQ%40mail.gmail.com.

Reply via email to