On Thu, Jul 18, 2019 at 5:30 AM Anton Okolnychyi <aokolnyc...@apple.com.invalid> wrote:
> Let me summarize what we talked here and follow up with a PR. > > - Iceberg should allow users to define a sort oder in its metadata that > applies to partitions. > - We should never assume the sort order is actually applied to all files > in the table. > This one seems really problematic. Too many important optimizations depend on the file sort order. Can we have the writer verify the sort order as the files are written? - Sort orders might evolve and change over time. When this happens, > existing files will not be rewritten. Query engines should follow the > updated sort order during subsequent writes. As a result, files within a > table or partition can be sorted differently at a given point in time. > - We should be able to define a sort order even for unpartitioned tables, > as opposed to current Spark tables that allow a sort order only for > bucketed tables. > - SortOrder is separate from PartitionSpec. > - SortOrder will rely on transformations to define complex sort orders. > - Files will be annotated with sort_order_id instead of sort_columns. We > keep the question of file_ordinal open for now. > - To begin with, we will support asc/desc natural sort orders (UTF8 > ordering for Strings). > > Thanks, > Anton > > On 16 Jul 2019, at 23:56, Ryan Blue <rb...@netflix.com.INVALID> wrote: > > I agree that Iceberg metadata should include a way to configure a desired > sort order. But I want to note that I don’t think that we can ever assume > that it has been applied. Table configuration will evolve as use changes. > We don’t want to require rewrites when a configuration gets updated, so an > assumption should be that data files might not be sorted. > > Files that are sorted should indicate how they are sorted, so that > optimizations are applied if the file’s metadata indicates it can be safely > applied. For example, if both deletes and data rows are sorted the same > way, you can merge the two streams instead of using a hash set to check > whether a record has been deleted. I think this should rely on the delete > file’s sort order matching the data file it is applied to. > > Should Iceberg allow users to define a sort spec only if the table is > bucketed? > > No. In Iceberg, bucketing is just another partition transform. > > However, I think we need to consider what a sort order will mean. Here are > a few observations: > > - Each file can have a sort order for its rows (Spark’s > sortWithinPartitions, which sorts each task’s data) > - Sorting is also used to cluster values across files so it makes > sense for a table sort order to be applied within partitions (ORDER BY) > - Multiple writes to the same partition are not expected to rewrite > existing data, so a partition may only be partially sorted or may have > multiple sorted file sets > - Partitioning is independent from sorting. Even when partitioning is > orthogonal to a sort order (i.e., bucketing), partitioning must still take > precedence. > > My conclusion is that a configured sort order applies to partitions, not > data across partitions. Again, bucketing is just another type of partition. > > How should Iceberg encode sort specs? > > I don’t think this should be in table properties. The sort order should > reference columns by ID so it doesn’t need to be changed when columns are > renamed. I think this should be implemented like PartitionSpec. > > If sorting is applied within partitions, then I would make PartitionSpec > and SortOrder separate. I would still use transforms to produce more > complex sort orders. I think that’s a great idea, but we don’t need to mix > partitioning and sorting to reuse transforms. Like partition specs, I think > a table should be able to define multiple sort orders and each should be > identified by ID. Then each data file can encode which sort order it was > written with, just like manifests and partition specs. > > I think we should add sort-orders like partition-specs, and a > default-sort-order-id like default-spec-id. This would also require > removing sort_columns from data files in the spec > <http://iceberg.apache.org/spec/#manifests> and adding sort_order_id. We > can keep file_ordinal, but probably want to add some context to know the > group of files where it is valid. We could also remove it. > > Which sort orders should Iceberg support? > > I agree with what’s already been said: we should use a natural order for > each type > <https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Comparators.java#L31-L45>, > ascending and descending. To start, Strings must use UTF-8’s natural > ordering and we can expand from there. > > Here’s what a sort order might look like: > > "sort-orders": [ > { "id": 1, "ordering": [ { "source-id": 4, "ascending": true } ] }, > { "id": 2, "ordering": [ { "source-id": 4, "ascending": true }, { > "source-id": 5, "ascending": false } ] }, > { "id": 3, "ordering": [ { "transform": "zorder", "ascending": false, > "source-ids": [4, 5] } ] }, > ] > > > On Thu, Jul 4, 2019 at 9:46 AM Anton Okolnychyi < > aokolnyc...@apple.com.invalid> wrote: > >> In order to begin prototyping, I would start with the following questions. >> >> 1) Does Iceberg need a sort spec? >> - I would say yes >> 2) Should Iceberg allow users to define a sort spec only if the table is >> bucketed? >> - I would say no, as it seems valid to have partitioned and sorted tables. >> 3) How should Iceberg encode sort specs? >> - Option #1 is to rely on table properties, which will allow us to use >> ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am >> not sure it would be easy to encode non-trivial sort specs and track sort >> spec evolution (if needed). >> - Option #2 is to extend PartitionSpec to cover sorting as well. This >> option will allow us to use transformations to encode non-trivial sorts and >> won't require many changes to the codebase. >> - Option #3 is to store SortSpec separately from PartitionSpec. This >> will require more changes compared to Option #2 but can also give us extra >> flexibility. >> >> Each option has its own trade-offs, but I tend to think #2 is reasonable. >> >> 4) Which sort orders should Iceberg support? >> - I think we have to be flexible and support adding more sort orders >> later. In addition to what Owen said, we can add sorting based on >> multi-dimensional space-filling curves in the future. >> >> >> What do you think? >> >> Thanks, >> Anton >> >> On 1 Jul 2019, at 18:06, Owen O'Malley <owen.omal...@gmail.com> wrote: >> >> My thought is just like Iceberg has to define partitioning and bucketing, >> it has to define a canonical sort order. In particular, we can’t afford to >> have Spark, Presto, and Hive writing files in different orders. I believe >> the right approach is to define a sort order as a series of columns where >> each column is either ascending or descending and defining the natural sort >> order for each type. >> >> The hard bit will be if we need to support non-natural sorts of strings. >> For example, if we need to support case-insensitive sorts or the different >> collations that databases support, I’d hope that we could start with the >> default of utf-8 byte ordering and expand as needed. If you are curious >> what the different collations look like - >> https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database >> . >> >> .. Owen >> >> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi < >> aokolnyc...@apple.com.INVALID> wrote: >> >> Hey folks, >> >> Iceberg users are advised not only to partition their data but also to >> sort within partitions by columns in predicates in order to get the best >> performance. Right now, this process is mostly manual and performed by >> users before writing. >> I am wondering if we should extend Iceberg metadata so that query engines >> can do this automatically in the future. We already have `sortColumns` in >> DataFile but they are not used. >> >> - Do we need a notion of sort columns in TableMetadata? >> - Spark’s sort spec is tightly coupled with bucketing and cannot be >> used alone. However, it seems reasonable to have partitioned and sorted >> tables without bucketing. How do we see this in Iceberg? >> - If we decide to have sort spec in the metadata, do we want to make >> it part of PartitionSpec or have it separately? >> >> Thanks, >> Anton >> >> >> >> > > -- > Ryan Blue > Software Engineer > Netflix > > >