Re: [I] make to_duckdb accept multiple tables [iceberg-python]

via GitHub Sat, 07 Dec 2024 11:47:34 -0800


ischwart1 commented on issue #408:
URL: https://github.com/apache/iceberg-python/issues/408#issuecomment-2525291565


   @djouallah 
   
   Apparently you can pass a duckdb connection to `to_duckdb()`, that way you 
can chain many table together:
   
   ```python
   
   with (
       catalog.load_table(f"{config.lake.namespace}.{config.bronze.map_id}")
       .scan()
       .to_duckdb("bronze_map_id") as con1
   ):
       with (
           catalog.load_table(f"{config.lake.namespace}.{config.bronze.mob}")
           .scan()
           .to_duckdb("bronze_mob", con1) as con2
       ):
           con2.sql("""--sql
           with mob as (
               select unnest(mobs).link as link from bronze_map_id
           ),
           links as (
               select
                   regexp_extract(link, '/mob/(\\d+)', ['mob_id']).mob_id as 
mob_id,
                   count(*) as count
               from
                   mob
               group by
                   link
           )
           select
                   links.mob_id, name, count
           from
               links
           join
               bronze_mob
           on
               links.mob_id = bronze_mob.id
           """).show(max_width=1000)
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] make to_duckdb accept multiple tables [iceberg-python]

Reply via email to