On Thu, Jul 18, 2019 at 5:30 AM Anton Okolnychyi
<aokolnyc...@apple.com.invalid> wrote:

> Let me summarize what we talked here and follow up with a PR.
>
> - Iceberg should allow users to define a sort oder in its metadata that
> applies to partitions.
> - We should never assume the sort order is actually applied to all files
> in the table.
>

This one seems really problematic. Too many important optimizations depend
on the file sort order. Can we have the writer verify the sort order as the
files are written?

- Sort orders might evolve and change over time. When this happens,
> existing files will not be rewritten. Query engines should follow the
> updated sort order during subsequent writes. As a result, files within a
> table or partition can be sorted differently at a given point in time.
> - We should be able to define a sort order even for unpartitioned tables,
> as opposed to current Spark tables that allow a sort order only for
> bucketed tables.
> - SortOrder is separate from PartitionSpec.
> - SortOrder will rely on transformations to define complex sort orders.
> - Files will be annotated with sort_order_id instead of sort_columns. We
> keep the question of file_ordinal open for now.
> - To begin with, we will support asc/desc natural sort orders (UTF8
> ordering for Strings).
>
> Thanks,
> Anton
>
> On 16 Jul 2019, at 23:56, Ryan Blue <rb...@netflix.com.INVALID> wrote:
>
> I agree that Iceberg metadata should include a way to configure a desired
> sort order. But I want to note that I don’t think that we can ever assume
> that it has been applied. Table configuration will evolve as use changes.
> We don’t want to require rewrites when a configuration gets updated, so an
> assumption should be that data files might not be sorted.
>
> Files that are sorted should indicate how they are sorted, so that
> optimizations are applied if the file’s metadata indicates it can be safely
> applied. For example, if both deletes and data rows are sorted the same
> way, you can merge the two streams instead of using a hash set to check
> whether a record has been deleted. I think this should rely on the delete
> file’s sort order matching the data file it is applied to.
>
> Should Iceberg allow users to define a sort spec only if the table is
> bucketed?
>
> No. In Iceberg, bucketing is just another partition transform.
>
> However, I think we need to consider what a sort order will mean. Here are
> a few observations:
>
>    - Each file can have a sort order for its rows (Spark’s
>    sortWithinPartitions, which sorts each task’s data)
>    - Sorting is also used to cluster values across files so it makes
>    sense for a table sort order to be applied within partitions (ORDER BY)
>    - Multiple writes to the same partition are not expected to rewrite
>    existing data, so a partition may only be partially sorted or may have
>    multiple sorted file sets
>    - Partitioning is independent from sorting. Even when partitioning is
>    orthogonal to a sort order (i.e., bucketing), partitioning must still take
>    precedence.
>
> My conclusion is that a configured sort order applies to partitions, not
> data across partitions. Again, bucketing is just another type of partition.
>
> How should Iceberg encode sort specs?
>
> I don’t think this should be in table properties. The sort order should
> reference columns by ID so it doesn’t need to be changed when columns are
> renamed. I think this should be implemented like PartitionSpec.
>
> If sorting is applied within partitions, then I would make PartitionSpec
> and SortOrder separate. I would still use transforms to produce more
> complex sort orders. I think that’s a great idea, but we don’t need to mix
> partitioning and sorting to reuse transforms. Like partition specs, I think
> a table should be able to define multiple sort orders and each should be
> identified by ID. Then each data file can encode which sort order it was
> written with, just like manifests and partition specs.
>
> I think we should add sort-orders like partition-specs, and a
> default-sort-order-id like default-spec-id. This would also require
> removing sort_columns from data files in the spec
> <http://iceberg.apache.org/spec/#manifests> and adding sort_order_id. We
> can keep file_ordinal, but probably want to add some context to know the
> group of files where it is valid. We could also remove it.
>
> Which sort orders should Iceberg support?
>
> I agree with what’s already been said: we should use a natural order for
> each type
> <https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Comparators.java#L31-L45>,
> ascending and descending. To start, Strings must use UTF-8’s natural
> ordering and we can expand from there.
>
> Here’s what a sort order might look like:
>
> "sort-orders": [
>     { "id": 1, "ordering": [ { "source-id": 4, "ascending": true } ] },
>     { "id": 2, "ordering": [ { "source-id": 4, "ascending": true }, { 
> "source-id": 5, "ascending": false } ] },
>     { "id": 3, "ordering": [ { "transform": "zorder", "ascending": false, 
> "source-ids": [4, 5] } ] },
>   ]
>
>
> On Thu, Jul 4, 2019 at 9:46 AM Anton Okolnychyi <
> aokolnyc...@apple.com.invalid> wrote:
>
>> In order to begin prototyping, I would start with the following questions.
>>
>> 1) Does Iceberg need a sort spec?
>>   - I would say yes
>> 2) Should Iceberg allow users to define a sort spec only if the table is
>> bucketed?
>> - I would say no, as it seems valid to have partitioned and sorted tables.
>> 3) How should Iceberg encode sort specs?
>> - Option #1 is to rely on table properties, which will allow us to use
>> ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am
>> not sure it would be easy to encode non-trivial sort specs and track sort
>> spec evolution (if needed).
>> - Option #2 is to extend PartitionSpec to cover sorting as well. This
>> option will allow us to use transformations to encode non-trivial sorts and
>> won't require many changes to the codebase.
>>   - Option #3 is to store SortSpec separately from PartitionSpec. This
>> will require more changes compared to Option #2 but can also give us extra
>> flexibility.
>>
>> Each option has its own trade-offs, but I tend to think #2 is reasonable.
>>
>> 4) Which sort orders should Iceberg support?
>> - I think we have to be flexible and support adding more sort orders
>> later. In addition to what Owen said, we can add sorting based on
>> multi-dimensional space-filling curves in the future.
>>
>>
>> What do you think?
>>
>> Thanks,
>> Anton
>>
>> On 1 Jul 2019, at 18:06, Owen O'Malley <owen.omal...@gmail.com> wrote:
>>
>> My thought is just like Iceberg has to define partitioning and bucketing,
>> it has to define a canonical sort order. In particular, we can’t afford to
>> have Spark, Presto, and Hive writing files in different orders. I believe
>> the right approach is to define a sort order as a series of columns where
>> each column is either ascending or descending and defining the natural sort
>> order for each type.
>>
>> The hard bit will be if we need to support non-natural sorts of strings.
>> For example, if we need to support case-insensitive sorts or the different
>> collations that databases support, I’d hope that we could start with the
>> default of utf-8 byte ordering and expand as needed. If you are curious
>> what the different collations look like -
>> https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database
>>  .
>>
>> .. Owen
>>
>> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi <
>> aokolnyc...@apple.com.INVALID> wrote:
>>
>> Hey folks,
>>
>> Iceberg users are advised not only to partition their data but also to
>> sort within partitions by columns in predicates in order to get the best
>> performance. Right now, this process is mostly manual and performed by
>> users before writing.
>> I am wondering if we should extend Iceberg metadata so that query engines
>> can do this automatically in the future. We already have `sortColumns` in
>> DataFile but they are not used.
>>
>>    - Do we need a notion of sort columns in TableMetadata?
>>    - Spark’s sort spec is tightly coupled with bucketing and cannot be
>>    used alone. However, it seems reasonable to have partitioned and sorted
>>    tables without bucketing. How do we see this in Iceberg?
>>    - If we decide to have sort spec in the metadata, do we want to make
>>    it part of PartitionSpec or have it separately?
>>
>> Thanks,
>> Anton
>>
>>
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>
>
>

Reply via email to