are these remotely accessible? and who pays?

I'm just thinking of whether its an datasource for regression testing.

For s3a we use public (free) parquet datasets for some of the scale read
testing...keeps setup time minimal and stops "needs a few hundred MB of
data in s3" as a cost blocker to contributors (*).

It'd be nice to have public iceberg datasets in the various stores for
similar regression tests

steve

(*) we use NOAA data, luckily the s3 bucket hasn't been decommissioned by
the US govt, though I did worry about that last year

On Wed, 14 Jan 2026 at 21:27, Alex Stephen via dev <[email protected]>
wrote:

> Hi all,
>
> We just launched a public dataset (backed by a public Iceberg REST
> Catalog) that can be accessed by any Iceberg-enabled query engine. The goal
> is for Iceberg developers to begin diving into the ecosystem without
> bootstrapping a full catalog and creating data.
>
> We'd love to hear any of your thoughts on how we can improve it.
>
> Announcement blog post
> <https://opensource.googleblog.com/2026/01/explore-public-datasets-with-apache-iceberg-and-biglake.html>
> Example PySpark script
> <https://gist.github.com/rambleraptor/7fd2fd55a208da7e5c000430d54d8db4>
>
> Thanks!
>
> -- Alex Stephen
>

Reply via email to