Re: Materialized Views: Next Steps

2024-05-16 Thread Benny Chow
Sounds good. Another benefit of the struct model is that it's more extensible in the future when we need to disambiguate the same table that appears multiple times in the MV query tree. This could happen with time travel queries or branching. We may end up adding additional properties like a

Re: Materialized Views: Next Steps

2024-05-16 Thread Walaa Eldin Moustafa
Hi Benny, I have responded to the comment. I would suggest that we use this thread to evaluate properties model vs top level metadata model (to avoid discussion drift). If we have feedback on the actual properties used in the properties model as defined in the PR, we can have the discussion

Re: Materialized Views: Next Steps

2024-05-16 Thread Benny Chow
Hi Walaa I left comments in your spec PR: https://github.com/apache/iceberg/pull/10280#pullrequestreview-2061922169 My last question about use cases was really about incremental refresh with aggregates. But I think this might be too complicated to try to model/discuss now and so I agree with

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-05-16 Thread Jack Ye
+1 for a JSON/BSON type. We also had the same discussion internally and a JSON type would really play well with for example the SUPER type in Redshift: https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html, and can also provide better integration with the Trino JSON type. Looking

Re: Can I get started with a plain java app?

2024-05-16 Thread Steven Wu
Iceberg has a Java library, which is the most complete implementation of the spec (compared to other languages like Python, Rust) at the moment. You can certainly use the Java library directly to write and commit data to Iceberg. But you will likely need to implement quite a bit of code for things

Can I get started with a plain java app?

2024-05-16 Thread John D. Ament
Completely naive question since I'm not familiar at all with the technologies. I wanted to demonstrate using Iceberg files as a way to ingest lots of data and persist it to S3. It seems like it can do this, but I have a feeling I need tools like Spark to do it. is that true? or can I hook it up