Hey Ryan,

That’s great, I’ll get started on a PR right away!
Thanks 😊

From: Ryan Blue <[email protected]>
Reply to: "[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>
Date: Tuesday, 19 November 2019 at 19:00
To: Iceberg Dev List <[email protected]>
Subject: Re: 'Examples' code contribution

Hi Christine,

It would be great for you to submit your code examples! I think that would be 
really helpful for other people as well.

For some things, it might also be a good idea to update the documentation on 
the ASF site, iceberg.apache.org<http://iceberg.apache.org>. The source for the 
site is in the `site` folder in github, if you think there are missing examples 
that would be beneficial to have on the site.

On Tue, Nov 19, 2019 at 5:19 AM Christine Mathiesen 
<[email protected]> wrote:
Hello!

Recently, I’ve been researching Iceberg with the goal of developing some simple 
code exemplifying how to use the Iceberg Java API. The goal was to share this 
internally with developers along with information we’ve gained about Iceberg to 
start discussions on whether we could use Iceberg in our systems. On reviewing 
the documentation and code we thought this could be useful for anyone 
interested in learning more about Iceberg so we would like to open source it.  
We noticed that Iceberg has a folder for examples 
(https://github.com/apache/incubator-iceberg/tree/master/examples) - there 
isn’t much there right now but it could be a good location for our examples and 
documentation.

Our project is currently structured as many small JUnit tests that target the 
different functionality of Iceberg (such as the reading/writing of 
partitioned/unpartitioned tables, schema evolution, time travel etc). We went 
for this approach so we could use it as a sort of quickstart guide to using 
Iceberg with different use cases in mind.

The code we have currently focuses mainly on using HadoopTables with Spark (in 
Java) and contains tests that follow this sort of pattern:

@Test
  public void writeToTableFromFile() {
    Dataset<Row> df = spark.read().json(dataLocation + "/employees.json");

    df.select("name", "salary").write()
      .format("iceberg")
      .mode("append")
      .save(tableLocation.toString());

    table.refresh();

    df.createOrReplaceTempView("table");

    Dataset<Row> sqlDF = spark.sql("select * from table");
    assertEquals(sqlDF.count(), 10);
}

Could the developers on the project let us know if they think the above would 
be a useful contribution and if so, what the next steps would be? We’re happy 
to answer any questions and provide more info etc.

Thank you and all the best,

Christine Mathiesen
Software Development Intern
BDP – Hotels.com
Expedia Group



--
Ryan Blue
Software Engineer
Netflix

Reply via email to