Re: [Discuss] Geospatial Support

2024-06-05 Thread Szehon Ho
bs >> can parse projjson. >> >> @Szehon Is there a way that we can support both SRID and PROJJSON in Geo >> Iceberg? >> >> It is also worth noting that, although there are many libs that can parse >> SRID and perform look-up in the EPSG database, the license of

Re: [Discuss] Geospatial Support

2024-05-29 Thread Szehon Ho
ferent data providers. > > To address this we would like to propose including the option to specify > the SRS with only a SRID in phase 1. The query engine may choose to treat > it as opaque identified or make a look-up in the EPSG database of > supported. > > Thank you again for

Re: [Discuss] Heap pressure with RewriteFiles APIs

2024-05-21 Thread Szehon Ho
Hi Naveen Yes it sounds like it will help to disable metrics for those columns? Iirc, by default it manifest entries have metrics at 'truncate(16)' level for 100 columns, which as you see can be quite memory intensive. A potential improvement later also is to have the ability to remove counts by

Re: Materialized Views: Next Steps

2024-05-10 Thread Szehon Ho
rk/sql/connector/catalog/ViewInfo.java#L45 > > Thanks, > Walaa. > > On Thu, May 9, 2024 at 11:30 PM Szehon Ho wrote: > >> Hi Walaa >> >> As there may be confusion in the word 'properties', I want to double >> check if we are talking about the same thing here

Re: Materialized Views: Next Steps

2024-05-10 Thread Szehon Ho
3QB4 > [2] > https://docs.google.com/document/d/1zg0wQ5bVKTckf7-K_cdwF4mlRi6sixLcyEh6jErpGYY/edit?pli=1=AAABIonvCGE > > Thanks, > Walaa. > > > On Thu, May 9, 2024 at 5:49 PM Szehon Ho wrote: > >> Hi Walaa, >> >> I agree, I definitely do not want yet an

Re: Materialized Views: Next Steps

2024-05-09 Thread Szehon Ho
now. If we agree, we can continue the > discussion on the PR, else, we can create a doc. > > Thanks, > Walaa. > > > On Thu, May 9, 2024 at 4:39 PM Szehon Ho wrote: > >> Thanks Walaa for driving it forward, looking forward to thinking about >> implementation

Re: Materialized Views: Next Steps

2024-05-09 Thread Szehon Ho
Thanks Walaa for driving it forward, looking forward to thinking about implementation of Materialized Views. I see Jan's point, the PR spec change is similar but does not seem to be completely aligned with the Draft Spec in the design doc:

[Discuss] Geospatial Support

2024-05-01 Thread Szehon Ho
Hi everyone, We have created a formal proposal for adding Geospatial support to Iceberg. Please read the following for details. - Github Proposal : https://github.com/apache/iceberg/issues/10260 - Proposal Doc:

Re: [Proposal] Add support for Materialized Views in Iceberg

2024-04-22 Thread Szehon Ho
+1 for the approach given it reduces the work. On this, as it exposes storage tables to user catalog, I was mainly thinking we should have a common suffix/naming pattern for storage table across catalog. The netflix approach sounds good to me. Hope we can continue the proposal, as there's still

Re: [VOTE] Release Apache Iceberg 1.5.1 RC0

2024-04-22 Thread Szehon Ho
+1 (binding) * Verify signature * Verify checksum * Verify licenses * Build and run basic test with Spark 3.5 Thanks Szehon On Sun, Apr 21, 2024 at 11:45 PM Ajantha Bhat wrote: > +1 (non-binding) > > * validated checksum and signature > * checked license docs & ran RAT checks > * ran build

Re: Materialized view integration with REST spec

2024-03-22 Thread Szehon Ho
s back? > > On Fri, Mar 22, 2024 at 10:35 AM Szehon Ho > wrote: > >> Hi >> >> My understanding was last time it was still unresolved, and the action >> item was on Jack and/or/ Jan to make a shorter document. I think the >> debate now has boiled down to Ryan'

Re: Materialized view integration with REST spec

2024-03-22 Thread Szehon Ho
gt;>> I originally excluded option 2 because I think it does not >>>>>>>>>>>>> align with the REST spec, but after the other discussion thread >>>>>>>>>>>>> about "Inconsistency &g

Re: New committer: Renjie Liu

2024-03-11 Thread Szehon Ho
Congratulations! On Mon, Mar 11, 2024 at 12:43 PM Jack Ye wrote: > Congratulations Renjie! > > Best, > Jack Ye > > On Mon, Mar 11, 2024, 8:24 AM Ryan Blue wrote: > >> Congratulations, Renjie! Thanks for all your contributions! >> >> On Mon, Mar 11, 2024 at 12:52 AM Eduard Tudenhoefner >>

Re: [VOTE] Release Apache Iceberg 1.5.0 RC6

2024-03-08 Thread Szehon Ho
+1 (binding) * Verified signature * Verified checksum * RAT check * built JDK 11 * Ran basic tests on Spark 3.5 Thanks Szehon On Fri, Mar 8, 2024 at 5:50 PM Amogh Jahagirdar wrote: > +1 non-binding > > Verified signatures,checksums,RAT checks, build, and tests with JDK11. I > also ran ad-hoc

Re: New committer: Bryan Keller

2024-03-05 Thread Szehon Ho
Congratulations Bryan, well deserved, great work on Iceberg ! On Tue, Mar 5, 2024 at 8:14 AM Jack Ye wrote: > Congrats Bryan! > > -Jack > > On Tue, Mar 5, 2024 at 7:33 AM Amogh Jahagirdar wrote: > >> Congratulations Bryan! Very well deserved, thank you for all your >> contributions! >> >> On

Re: [VOTE] Release Apache Iceberg 1.5.0 RC4

2024-03-01 Thread Szehon Ho
+1 (binding) - Verified signature - Verified checksum - RAT check - Compiled - Manually ran basic queries on Spark 3.5 On Fri, Mar 1, 2024 at 6:13 AM Fokko Driesprong wrote: > +1 (binding) > > - Checked checksum and signature > - Ran a modified version of dbt-spark to take advantage of the

Re: Materialized view integration with REST spec

2024-02-29 Thread Szehon Ho
Hi Yes I mostly agree with the assessment. To clarify a few minor points. is a materialized view a view and a separate table, a combination of the > two (i.e. commits are combined), or a new metadata type? For 'new metadata type', I consider mostly Jack's initial proposal of a new Catalog MV

Re: Materialized view integration with REST spec

2024-02-22 Thread Szehon Ho
these separate from discussions about single points >>>> so that they can be persisted in the document. >>> >>> >>> Not sure if it helpful, but I added voting chips Question 0, as maybe an >>> easier way to keep track of votes. If it is helpful, I can ad

Re: Materialized view integration with REST spec

2024-02-21 Thread Szehon Ho
think >>> this format is not effective, I propose that we create a new mv channel in >>> Iceberg Slack workspace, and people interested can join and discuss all >>> these points directly. What do we think? >>> >>> Best, >>> Jack Ye >>

Re: Materialized view integration with REST spec

2024-02-19 Thread Szehon Ho
Hi, Great to see more discussion on the MV spec. Actually, Jan's document "Iceberg Materialized View Spec" has been organized , with a "Design Questions" section to track these debates, and it would be nice to

Re: Spec change for multi-arg transform

2024-01-30 Thread Szehon Ho
Sorry I may have misunderstood the statement and maybe this is specific to multi-arg transform, in any case let's get a spec pr earlier in to discuss/specify behavior for V1-2 vs 3. Thanks Szehon On Tue, Jan 30, 2024 at 9:23 AM Szehon Ho wrote: > Thanks all for the discuss

Re: Spec change for multi-arg transform

2024-01-30 Thread Szehon Ho
the large spec updates, but in this case you >>>> haven't seen one since we haven't built the reference implementation yet. >>>> >>>> I think the confusion here comes from updating the spec markdown doc >>>> prematurely. I think the PR that was merged is missing

Re: Spec change for multi-arg transform

2024-01-28 Thread Szehon Ho
Hi, This would not be retrofitting existing partition transforms, but just allowing for the creation of new multi-arg transforms. Is the concern that some implementations are never expecting new transforms to be added? Old implementations would indeed not be able to read Iceberg tables created

Table owned locations

2023-08-29 Thread Szehon Ho
Hi all, As you know, there is a recurring Iceberg issue where delete orphan file operations may inadvertently delete other table's data, if they are misconfigured to have the same location. A while back, Anton had a proposal for 'owned.locations' in: https://github.com/apache/iceberg/issues/4159

Re: Proposal to fix the docs - this time it'll be different

2023-07-27 Thread Szehon Ho
Hi I'm ok with putting things back in Iceberg repo, it gets more visbility on prs. I guess it used to be a bit distracting, but now with more projects in Iceberg (pyiceberg, rust) we have to anyway use tags to filter through all the mails. Just wanted to +1 on Fokko/Ryan suggestion to avoid

[ANNOUNCE] Apache Iceberg release 1.3.1

2023-07-25 Thread Szehon Ho
I'm pleased to announce the release of Apache Iceberg 1.3.1! Apache Iceberg is an open table format for huge analytic datasets. Iceberg delivers high query performance for tables with tens of petabytes of data, along with atomic commits, concurrent writes, and SQL-compatible table evolution.

[PASSED][VOTE] Release Apache Iceberg 1.3.1 RC1

2023-07-24 Thread Szehon Ho
Szehon On Mon, Jul 24, 2023 at 2:21 PM Szehon Ho wrote: > +1 (binding) > > 1. Verify signatures > 2. Verify checksums > 3. Verify license documentation > 4. Built and ran tests, only failure is TestS3RestSigner > 5. Ran simple queries against Spark 3.4 > > Thanks > S

Re: [VOTE] Release Apache Iceberg 1.3.1 RC1

2023-07-24 Thread Szehon Ho
age.execute(ApiCallAttemptMetricCollectionStage.java:36) >> at >> software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81) >> ... 23 more >> >> Best, >> >> Yufei >> >> &g

[VOTE] Release Apache Iceberg 1.3.1 RC1

2023-07-17 Thread Szehon Ho
Hi Everyone, I propose that we release the following RC as the official Apache Iceberg 1.3.1 release. The commit ID is 62c34711c3f22e520db65c51255512f6cfe622c4 * This corresponds to the tag: apache-iceberg-1.3.1-rc1 * https://github.com/apache/iceberg/commits/apache-iceberg-1.3.1-rc1 *

Re: [DISCUSS] Apache Iceberg Release 1.3.1

2023-07-14 Thread Szehon Ho
e great to backport this to 1.3.x as > well. > > Kind regards, > Fokko > > Op wo 12 jul 2023 om 22:09 schreef Szehon Ho : > >> Hi guys >> >> Just an update on this. Another issue came up about the new 1.3.0 >> function rewrite_position_deletes (thanks Fok

Re: [DISCUSS] Apache Iceberg Release 1.3.1

2023-07-12 Thread Szehon Ho
/milestones/Iceberg%201.3.1 Thanks Szehon On Mon, Jul 10, 2023 at 11:14 AM Szehon Ho wrote: > Thanks Eduard! Merged all your backport prs, I will commit the last one > probably tomorrow and then we can start the release. > > Thanks > Szehon > > On Sun, Jul 9, 2023 at 11:53 P

Re: [DISCUSS] Apache Iceberg Release 1.3.1

2023-07-10 Thread Szehon Ho
that we can start backporting those bug fixes. > > Eduard > > On Fri, Jul 7, 2023 at 6:52 PM Szehon Ho wrote: > >> Thanks a lot Eduard! I think https://github.com/apache/iceberg/pull/7933 >> is also a good candidate as well. >> >> Thanks, >> Szehon

Re: [DISCUSS] Apache Iceberg Release 1.3.1

2023-07-07 Thread Szehon Ho
9:02 PM Jean-Baptiste Onofré >> wrote: >> >>> Hi, >>> >>> It sounds good to me to have 1.3.1. >>> >>> Thanks ! >>> Regards >>> JB >>> >>> On Fri, Jul 7, 2023 at 12:53 AM Szehon Ho >>> wrote: >&g

[DISCUSS] Apache Iceberg Release 1.3.1

2023-07-06 Thread Szehon Ho
Hi I wanted to start a discussion for whether its the right time for 1.3.1, a patch release of 1.3.0. It was started based on the issue found by Xiangyang (@ConeyLiu) : https://github.com/apache/iceberg/pull/7931#pullrequestreview-1507935277. Do people have any other bug fixes that should be

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-06-26 Thread Szehon Ho
it. > > Thanks for reviving the effort. > Manu > > Szehon Ho 于2023年6月22日 周四00:45写道: > >> Hi, >> >> Yea, its definitely an issue. >> >> Fwiw, I was looking at reviving the old effort in Spark to pass in >> configs dynamically in Spark SQL statem

Re: allowing configs to be specified in SQLConf for Spark reads/writes

2023-06-21 Thread Szehon Ho
Hi, Yea, its definitely an issue. Fwiw, I was looking at reviving the old effort in Spark to pass in configs dynamically in Spark SQL statement, which is probably the cleanest solution. (https://github.com/apache/spark/pull/34072 was the old effort, and I made

Re: Iceberg old partition gc

2023-06-03 Thread Szehon Ho
tem. tagging can extend the history with selective snapshots. > > It seems that you are saying that purging actions of old partitions are > creating new snapshots, which are taking up some space in the snapshot > history. But if snapshot expiration is time based (like 7 days), this > s

Re: Iceberg old partition gc

2023-06-02 Thread Szehon Ho
can then recover > the snapshot if you happen to have accidentally TTL'd a partition. > > On Fri, Jun 2, 2023 at 8:51 AM Szehon Ho wrote: > >> I think this violates Iceberg’s assumption of immutable snapshots. That >> would require modifying the old snapshot to no longer point to

Re: Iceberg old partition gc

2023-06-02 Thread Szehon Ho
I think this violates Iceberg’s assumption of immutable snapshots. That would require modifying the old snapshot to no longer point to those gc’ed data files, else not sure how you can time-travel to read from that snapshot, if some of its files are deleted? That being said, I also had this

Re: [DISCUSS] Default format version for new tables?

2023-05-24 Thread Szehon Ho
Hi, I'm +1 to making v2 the default, say after this release. It seems most of the features brought up as concerns on Spark side in the thread Gabor linked have been implemented (like position delete lifecycle). But Anton's point is also good. Even if some delete file features are missing, V2

Re: [VOTE] Release Apache Iceberg 1.3.0 RC0

2023-05-24 Thread Szehon Ho
+1 (binding) 1. verify signatures 2. verify checksum 3. verify license documentation 4. build and run tests 5. Ran simple tests on Spark 3.4 - Create simple table and check metadata tables - Ran 'delete from' statement to generate position delete, and run rewrite_position_delete Thanks Szehon

Re: Welcome new committers and PMC!

2023-05-05 Thread Szehon Ho
Thanks all, really appreciate it, and congrats to Eduard and Amogh ! Szehon On Fri, May 5, 2023 at 12:37 AM Mingliang Liu wrote: > Congrats! All well deserved. > > On Thu, May 4, 2023 at 11:50 PM Eduard Tudenhoefner > wrote: > >> Thanks everyone, and also congrats to Amogh and Szehon! >> >>

Re: tradeoffs between serializable vs snapshot isolation for single writer

2023-05-04 Thread Szehon Ho
Whoops, I didn’t see Ryan answer already. > On May 4, 2023, at 3:18 PM, Szehon Ho wrote: > > Hi, > > I believe it only matters if you have conflicting commits. For single writer > case, I think you are right and it should not matter, so you may save very > sl

Re: tradeoffs between serializable vs snapshot isolation for single writer

2023-05-04 Thread Szehon Ho
Hi, I believe it only matters if you have conflicting commits. For single writer case, I think you are right and it should not matter, so you may save very slightly in performance by turning it to Snapshot Isolation. The checks are metadata checks though, so I would think it will not be a

Re: [Proposal] Partition stats in Iceberg

2023-05-02 Thread Szehon Ho
g forward to the work in the phase 2 implementation. > Let me know if I can help, thanks. > > On Tue, May 2, 2023 at 4:28 PM Szehon Ho wrote: > >> Yea I agree, I had a handy query for the last update time of partition. >> >> SELECT >> >> e.data_file.partition, &

Re: [Proposal] Partition stats in Iceberg

2023-05-02 Thread Szehon Ho
Yea I agree, I had a handy query for the last update time of partition. SELECT e.data_file.partition, MAX(s.committed_at) AS last_modified_time FROM db.table.snapshots s JOIN db.table.entries e WHERE s.snapshot_id = e.snapshot_id GROUP BY by e.data_file.partition It's a bit lengthy

Re: Welcome new PMC members!

2023-04-12 Thread Szehon Ho
Nice, congratulations guys! Szehon On Wed, Apr 12, 2023 at 12:35 AM Gidon Gershinsky wrote: > Congrats Fokko, Steven, Yufei! > > Cheers, Gidon > > > On Wed, Apr 12, 2023 at 7:14 AM Ajantha Bhat > wrote: > >> Congratulations to all. >> >> On Wed, Apr 12, 2023 at 8:51 AM OpenInx wrote: >> >>>

Re: [VOTE] Release Apache Iceberg 1.2.1 RC2

2023-04-07 Thread Szehon Ho
+1 (non-binding) Verified signature Verified checksum Verified License Built and ran tests Ran simple queries on spark 3.3. Thanks Dan for the release, Szehon On Thu, Apr 6, 2023 at 12:04 PM Daniel Weeks wrote: > Hi Everyone, > > I propose that we release the following RC as the official

Re: [Discuss] Allow all users who have Committed to the project to run CI without Approval

2023-03-29 Thread Szehon Ho
+1 Thanks Szehon On Wed, Mar 29, 2023 at 10:27 AM Eduard Tudenhoefner wrote: > +1 for "Only requires approval first time" > > On Wed, Mar 29, 2023 at 6:32 PM John Zhuge wrote: > >> +1 for "Only requires approval first time" >> >> On Wed, Mar 29, 2023 at 9:03 AM Ajantha Bhat >> wrote: >> >>>

Re: [VOTE] Release Apache Iceberg 1.2.0 RC1

2023-03-15 Thread Szehon Ho
Hi, One note, on this release, I ran some simple spark-SQL using a local Spark, like "insert into table select 1". I find any of these operation now spawns 200 executors and takes awhile to finish. |== Physical Plan ==\nAppendData

Re: In Remembrance of Kyle

2022-12-06 Thread Szehon Ho
Very shocked when I first heard this over the weekend. Became more sad when I learned how long he was sick for, and so humbled that he chose to spend so much of his last days with us in the Iceberg community. I did not have a chance to work directly with him in Apple as I was on a different

Re: [VOTE] Release Apache Iceberg 1.1.0 RC2

2022-11-17 Thread Szehon Ho
+1 (non-binding) 1. Verify signature 2. Verify checksum 3. License RAT check 4. Run unit test, Actually got a failure: org.apache.iceberg.spark.extensions.TestCopyOnWriteDelete > testDeleteWithSnapshotIsolation[catalogName = spark_catalog, implementation =

RemoveDanglingDeleteFile proposal

2022-11-04 Thread Szehon Ho
Hi all, I made a proposal about adding a Spark Procedure RemoveDanglingDeleleteFiles. It would do a more comprehensive job to remove Delete Files that stay around after they become invalid (stop applying to Data Files), which happens in some cases, taking up storage and potentially affecting

Re: [DISCUSS] October board report

2022-10-12 Thread Szehon Ho
Turoczy, Bill Zhang) - Apache Iceberg's REST Catalog - A Gateway to Enriching Data Access via the Simplicity of an HTTP Service (Sam Redai) - Iceberg's Best Secret: Exploring Metadata Tables (Szehon Ho) - Integrated Audits: Streamlined Data Observability with Apache Iceberg (Sam Redai

Re: [VOTE] Release Apache Iceberg 1.0.0 RC0

2022-10-10 Thread Szehon Ho
:26 AM Szehon Ho wrote: > Hi, > > I get a NoClassDefFoundError from IcebergSparkExtensions when running > Spark 3.3, with iceberg-spark-runtime-3.3_2.12-1.0.0.jar. I noticed this > jar doesn't contain scala classes, unlike previous jars > iceberg-spark-runtime-3.3_2.12-0.1

Re: [VOTE] Release Apache Iceberg 1.0.0 RC0

2022-10-10 Thread Szehon Ho
Hi, I get a NoClassDefFoundError from IcebergSparkExtensions when running Spark 3.3, with iceberg-spark-runtime-3.3_2.12-1.0.0.jar. I noticed this jar doesn't contain scala classes, unlike previous jars iceberg-spark-runtime-3.3_2.12-0.14.1.jar. scala> spark.sql("show databases").show

Re: Welcome Yufei Gu as a committer

2022-08-25 Thread Szehon Ho
Congratulations, Yufei! Thanks Szehon > On Aug 25, 2022, at 4:20 PM, Anton Okolnychyi > wrote: > > I’d like to welcome Yufei Gu as a committer to the project. > > Thanks for all your hard work, Yufei! > > - Anton

Re: Welcome Fokko Driesprong as a committer!

2022-08-22 Thread Szehon Ho
Congratulations! Szehon On Mon, Aug 22, 2022 at 12:25 PM Péter Váry wrote: > Congratulations Fokko! > > On Mon, Aug 22, 2022, 16:37 Jahagirdar, Amogh > wrote: > >> Congratulations Fokko! >> >> >> >> *From: *Gabor Kaszab >> *Reply-To: *"dev@iceberg.apache.org" >> *Date: *Monday, August 22,

Re: [DISCUSS] Automatic Code Formatting / Code Style / Enforcing Code Style

2022-07-29 Thread Szehon Ho
Thanks for the auto formatting initiative, I think its really a time saver. I also agree about the line length, it would be better to keep it at 120 and a bummer it has to be reduced to 100 now. Looking at palantir-format, I actually like some of their format choices like line-length and also

Re: [VOTE] Release Apache Iceberg 0.14.0 RC1

2022-07-15 Thread Szehon Ho
+1 (non-binding) - Verified signature - Verified checksum - Rat check - Could not find Apache license headers on iceberg-build.properties ( as mentioned by Ryan) - Ran tests - Same error mentioned by John: org.apache.iceberg.aws.s3.TestS3FileIO >

Re: [VOTE] Adopt Puffin format as a file format for statistics and indexes

2022-06-09 Thread Szehon Ho
+1, it's an exciting step for Iceberg, look forward to all the new statistics and secondary indices it will allow. Had a few questions of what the reference to Puffin file(s) will be in the Iceberg spec, but it's orthogonal to Puffin file format itself. Thanks, Szehon On Thu, Jun 9, 2022 at

Re: [VOTE] Release Apache Iceberg 0.13.2 RC1

2022-06-06 Thread Szehon Ho
+1 (non-binding) 1. Verified signatures 2. Verified checksums 3. RAT checks 4. Build and test 5. Tested with Spark 3.2, create a table and run a few queries Thanks Szehon On Mon, Jun 6, 2022 at 10:46 AM Daniel Weeks wrote: > +1 (binding) > > verified

Re: [VOTE] Release Apache Iceberg 0.13.2 RC0

2022-05-29 Thread Szehon Ho
On the other topic, the pr for 0.13 branch is merged: https://github.com/apache/iceberg/pull/4890, my preference will be to include this in new RC to solve the aforementioned issue : https://github.com/apache/iceberg/issues/4718. Thanks, Szehon On Sun, May 29, 2022 at 2:59 PM Szehon Ho wrote

Re: [VOTE] Release Apache Iceberg 0.13.2 RC0

2022-05-29 Thread Szehon Ho
PikR8XEqs0YkO > wdFeyrBN22jtT48jMJ4IFw4odabqOqBn6Wazx3tBg0ZMTxn/i2H4tHpe78RIj/7Z > 7eLhkMY0meA64TMBCc0aS3ffCnJzetWOSpgjv9o= > =gy3b > -END PGP PUBLIC KEY BLOCK- > > > > On May 28, 2022, at 2:04 PM, Szehon Ho wrote: > > Hi > > For gpg verify KEYS i get: &g

Re: [VOTE] Release Apache Iceberg 0.13.2 RC0

2022-05-28 Thread Szehon Ho
Hi For gpg verify KEYS i get: gpg: Can't check signature: No public key I imported latest keys and do see key for : uid Russell Spitzer (CODE SIGNING KEY) sub rsa4096 2022-05-26 [E] but maybe no public key? Maybe I am missing something obvious. Also wanted to ask, can we get this

Re: Welcome Szehon Ho as a committer!

2022-03-11 Thread Szehon Ho
ufei Gu >>>> <mailto:flyrain...@gmail.com>> wrote: > >>>>> > >>>>> Congratulations Szehon! > >>>>> Best, > >>>>> > >>>>> Yufei > >>>>>

Re: Getting last modified timestamp/other stats per partition

2022-03-07 Thread Szehon Ho
*From:* Mayur Srivastava >> *Sent:* Thursday, February 24, 2022 7:27 AM >> *To:* dev@iceberg.apache.org >> *Subject:* RE: Getting last modified timestamp/other stats per partition >> >> >> >> Thanks Szehon. I’ll give this a try. >> >> >> >>

Re: Getting last modified timestamp/other stats per partition

2022-02-23 Thread Szehon Ho
Hi Probably the metadata tables can help with this. For the size/num_rows of partitions, you can query the files table, https://iceberg.apache.org/docs/latest/spark-queries/#files. (Because Iceberg keeps stats for files, and not necessary partitions). SELECT partition, sum(file_size_in_bytes),

Re: [VOTE] Release Apache Iceberg 0.13.0 RC2

2022-01-30 Thread Szehon Ho
+1 (non-binding) Verified signature Verified checksum Rat check Built and ran test, all succeed, after some temporary local HMS timeout Tested relevant jar with Spark 3.2, created various tables and ran queries Thanks Szehon On Fri, Jan 28, 2022 at 12:19 PM Russell Spitzer wrote: > +1 > All

Re: Number of entries in manifest-list

2022-01-07 Thread Szehon Ho
; > Is there a general order-of-magnitude target number of `manifest_file` > structs? Presumably that would dictate when one would want to merge > manifest files and/or data files. > > Thanks again! > ggg > > > On Fri, Jan 7, 2022 at 11:41 AM Szehon Ho wrote: > >>

Re: Number of entries in manifest-list

2022-01-07 Thread Szehon Ho
Hi, The manifest entries are one per data file or delete file, so depends how many data files/delete files your table has. Number of files is controlled mostly by the parallelism of the job that writes the table, though there are Iceberg RewriteDataFile utilities that can compact as well (as in

Re: Welcome new PMC members!

2021-11-18 Thread Szehon Ho
Awesome, congratulations Jack and Russell! > On 18 Nov 2021, at 09:30, Ryan Murray wrote: > > Congratulations both! Well deserved! > > On Thu, 18 Nov 2021, 09:19 Omar Al-Safi, > wrote: > Congrats both of you! > > On Thu, Nov 18, 2021 at 8:31 AM Eduard Tudenhoefner

Re: [DISCUSS] Iceberg roadmap

2021-09-10 Thread Szehon Ho
Hi I also missed the last sync, and wanted to add two things if possible. Thanks, Szehon Priority 2: - Core: Predicate pushdown for remaining Metadata tables [medium] - Core/Spark: Support serializable isolation for ReplacePartitions / Insert Overwrite [medium] On Fri, Sep 10, 2021

Re: Iceberg python library sync

2021-08-12 Thread Szehon Ho
+1, would love to listen in as well Thanks, Szehon > On 12 Aug 2021, at 12:48, Arthur Wiedmer > wrote: > > Hi Jun, > > Please add me as well! > > Best, > Arthur > > > > On Thu, Aug 12, 2021 at 12:19 AM Jun H. > wrote: > Hi everyone, > > Since early this year,

Re: Subject: [VOTE] Release Apache Iceberg 0.12.0 RC3

2021-08-10 Thread Szehon Ho
+1 (non binding) * Checked Signature Keys * Verified Checksum * Rat checks * Build and run tests, most functionality pass (also timeout errors on Hive-MR) Thanks Szehon On Tue, Aug 10, 2021 at 1:40 AM Ryan Murray wrote: > +1 (non-binding) > > * Verify Signature Keys > * Verify Checksum > *

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Szehon Ho
Got it, I somehow thought changes were manually cherry-picked, thanks for clarification. Thanks Szehon > On 9 Aug 2021, at 13:34, Ryan Blue wrote: > > Szehon, I think that should make it because the RC will come from master. > > On Mon, Aug 9, 2021 at 12:56 PM Szehon Ho w

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-09 Thread Szehon Ho
> configurations using configs prefixed with > "spark.sql.catalog.(catalog-name).hadoop." > - one of my contributions to this release that has been asked about by > several customers internally > - tested using `spark.sql.catalog.(catalog-name).hadoop.fs.s3a.impl`

Re: [VOTE] Release Apache Iceberg 0.12.0 RC2

2021-08-06 Thread Szehon Ho
+1 (non-binding) * Verify Signature Keys * Verify Checksum * dev/check-license * Build * Run tests (though some timeout failures, on Hive MR test..) Thanks Szehon On Thu, Aug 5, 2021 at 2:23 PM Daniel Weeks wrote: > +1 (binding) > > I verified sigs/sums, license, build, and test > > -Dan > >

Re: Serializable isolation for insert overwrites?

2021-07-20 Thread Szehon Ho
d I can point you in the > right direction. > > Ryan > > On Tue, Jul 20, 2021 at 4:20 PM Szehon Ho wrote: > >> Hi, >> >> Does anyone know if its feasible to consider making Spark's "insert >> overwrite" implement serializable transaction, like

Serializable isolation for insert overwrites?

2021-07-20 Thread Szehon Ho
Hi, Does anyone know if its feasible to consider making Spark's "insert overwrite" implement serializable transaction, like delete, update, merge? Maybe at least for "overwrite by filter", then it can narrow down the conflict checks needed on the commitWithSerializableTransaction side. I don't

Re: Iceberg 0.12.0 Release Plan

2021-07-19 Thread Szehon Ho
t; >3. #2284 Core: reassign the partition field IDs and reuse any existing >ID <https://github.com/apache/iceberg/pull/2284>s > > #2284 is in review. > > Ryan said he would take a look at #2308. > > @Szehon Ho , can you please confirm whether or not > you're wor

Re: Welcoming Jack Ye as a new committer!

2021-07-05 Thread Szehon Ho
Congratulations Jack! > On 5 Jul 2021, at 16:53, Jun H. wrote: > > Congratulations! > > >> On Jul 5, 2021, at 4:14 PM, Russell Spitzer >> wrote: >> >>  >> Congratulations! >> >> On Mon, Jul 5, 2021 at 3:21 PM karuppayya > > wrote: >> Congratulations Jack!

Re: Welcoming OpenInx as a new PMC member!

2021-06-29 Thread Szehon Ho
Congrats Zheng! > On 29 Jun 2021, at 14:02, Anton Okolnychyi > wrote: > > Well deserved! Congrats! > >> On 29 Jun 2021, at 13:56, Jack Ye > > wrote: >> >> Congratulations!!! >> >> On Tue, Jun 29, 2021 at 1:55 PM Ryan Murray > > wrote: >>

Re: Spark configuration on hive catalog

2021-04-22 Thread Szehon Ho
Hi Huadong, nice to see you again :). The syntax is spark-sql is ‘insert into .. …”, here you defined your db as a catalog? You just need to define one catalog and use it when referring to your table. > On 22 Apr 2021, at 07:34, Huadong Liu wrote: > > Hello Iceberg Dev, > > I am not

Re: Welcoming Russell Spitzer as a new committer

2021-03-29 Thread Szehon Ho
Awesome, well-deserved, Russell! Szehon > On 29 Mar 2021, at 18:10, Holden Karau wrote: > > Congratulations Russel! > > On Mon, Mar 29, 2021 at 9:10 AM Anton Okolnychyi > wrote: > Hey folks, > > I’d like to welcome Russell Spitzer as a new committer to the project! > > Thanks for all your

Re: Welcoming Ryan Murray as a new committer!

2021-03-29 Thread Szehon Ho
That’s awesome, great work Ryan. Szehon > On 29 Mar 2021, at 18:08, Anton Okolnychyi > wrote: > > Hey folks, > > I’d like to welcome Ryan Murray as a new committer to the project! > > Thanks for all the hard work, Ryan! > > - Anton

Re: Welcoming Yan Yan as a new committer!

2021-03-24 Thread Szehon Ho
Nice, congratulations! > On 24 Mar 2021, at 11:37, Marton Bod wrote: > > Congratulations, well done! > > On Wed, 24 Mar 2021 at 11:32, Peter Vary wrote: > Congratulations Yan! > >> On Mar 24, 2021, at 05:43, Yufei Gu > > wrote: >> >> Congratulations, Yan! >> >>