nuno-faria commented on PR #17266: URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3263742038
This is pretty cool! We can easily see the effects of the metadata caching: ```sql > CREATE EXTERNAL TABLE hits STORED AS PARQUET LOCATION 'https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet'; 0 row(s) fetched. Elapsed 0.106 seconds. Object Store Profiling 2025-09-07T12:17:04.082191800+00:00 operation=Head duration=0.056605s path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:04.138839400+00:00 operation=Get duration=0.019972s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:04.158853600+00:00 operation=Get duration=0.019854s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet Get Summary: count: 2 duration min: 0.019854s duration max: 0.019972s duration avg: 0.019913s size min: 8 B size max: 34322 B size avg: 17165 B size sum: 34330 B Head Summary: count: 1 duration min: 0.056605s duration max: 0.056605s duration avg: 0.056605s ``` With the metadata cache, three requests are done: ```sql > select * from hits limit 1; +---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+------------------ ---+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+ | WatchID | JavaEnable | Title | GoodEvent | EventTime | EventDate | CounterID | ClientIP | RegionID | UserID | CounterClass | OS | UserAgent | URL | Referer | IsRefresh | RefererCategoryID | RefererRegionID | URLCategoryID | URLRegionID | ResolutionWidth | ResolutionHeight | ResolutionDepth | FlashMajor | FlashMinor | FlashMinor2 | NetMajor | NetMinor | UserAgentMajor | UserAgentMinor | CookieEnable | JavascriptEnable | IsMobile | MobilePhone | MobilePhoneModel | Params | IPNetworkID | TraficSourceID | SearchEngineID | SearchPhrase | AdvEngineID | IsArtifical | WindowClientWidth | WindowClientHeight | ClientTimeZone | ClientEventTime | SilverlightVersion1 | SilverlightVersion2 | SilverlightVersion3 | SilverlightVersio n4 | PageCharset | CodeVersion | IsLink | IsDownload | IsNotBounce | FUniqID | OriginalURL | HID | IsOldCounter | IsEvent | IsParameter | DontCountHits | WithHash | HitColor | LocalEventTime | Age | Sex | Income | Interests | Robotness | RemoteIP | WindowName | OpenerName | HistoryLength | BrowserLanguage | BrowserCountry | SocialNetwork | SocialAction | HTTPError | SendTiming | DNSTiming | ConnectTiming | ResponseStartTiming | ResponseEndTiming | FetchTiming | SocialSourceNetworkID | SocialSourcePage | ParamPrice | ParamOrderID | ParamCurrency | ParamCurrencyID | OpenstatServiceName | OpenstatCampaignID | OpenstatAdID | OpenstatSourceID | UTMSource | UTMMedium | UTMCampaign | UTMContent | UTMTerm | FromTag | HasGCLID | RefererHash | URLHash | CLID | +---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+------------------ ---+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+ | 8374547729199360385 | 1 | d0a2d0b5d181d1822028d0a0d0bed181d181d0b8d18f29202d20d0afd0bdd0b4d0b5d0bad181 | 1 | 1373893805 | 15901 | 62 | 1388530699 | 229 | 3217804679217022550 | 0 | 2 | 5 | 687474703a2f2f6972722e72752f696e6465782e7068703f73686f77616c62756d2f6c6f67696e2d6c656e697961373737373239342c393338333033313330 | 687474703a2f2f6b696e6f706f69736b2e72752f3f7374617465 | 0 | 10813 | 952 | 9500 | 520 | 1638 | 1658 | 37 | 15 | 7 | 373030 | 0 | 0 | 22 | 44efbfbd | 1 | 1 | 0 | 0 | | | 3830428 | -1 | 0 | | 0 | 0 | 1654 | 936 | 135 | 1373857827 | 4 | 1 | 16561 | 0 | 77696e646f7773 | 1601 | 0 | 0 | 0 | 8731137316151599477 | | 4563091 | 0 | 0 | 0 | 0 | 0 | 35 | 1373847066 | 0 | 0 | 0 | 0 | 0 | 1547096432 | -1 | -1 | -1 | 5330 | efbfbd0c | | | 0 | 0 | 0 | 190 | 987 | 55 | 35 | 0 | | 0 | | 4e481c | 0 | | | | | | | | | | | 0 | -1172318462146836803 | 2868770270353813622 | 0 | +---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+------------------ ---+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+ 1 row(s) fetched. Elapsed 0.405 seconds. Object Store Profiling 2025-09-07T12:17:16.132117800+00:00 operation=Head duration=0.023943s path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:16.157671300+00:00 operation=Get duration=0.028673s size=8134104 range: bytes=4-8134107 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:16.157998800+00:00 operation=Get duration=0.068516s size=97589907 range: bytes=77340716-174930622 path=hits_compatible/athena_partitioned/hits_1.parquet Get Summary: count: 2 duration min: 0.028673s duration max: 0.068516s duration avg: 0.048595s size min: 8134104 B size max: 97589907 B size avg: 52862005 B size sum: 105724011 B Head Summary: count: 1 duration min: 0.023943s duration max: 0.023943s duration avg: 0.023943s ``` After disabling the metadata cache, a lot more requests will be done: ```sql > set datafusion.runtime.metadata_cache_limit = '0M'; 0 row(s) fetched. Elapsed 0.000 seconds. Object Store Profiling > select * from hits limit 1; +---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+------------------ ---+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+ | WatchID | JavaEnable | Title | GoodEvent | EventTime | EventDate | CounterID | ClientIP | RegionID | UserID | CounterClass | OS | UserAgent | URL | Referer | IsRefresh | RefererCategoryID | RefererRegionID | URLCategoryID | URLRegionID | ResolutionWidth | ResolutionHeight | ResolutionDepth | FlashMajor | FlashMinor | FlashMinor2 | NetMajor | NetMinor | UserAgentMajor | UserAgentMinor | CookieEnable | JavascriptEnable | IsMobile | MobilePhone | MobilePhoneModel | Params | IPNetworkID | TraficSourceID | SearchEngineID | SearchPhrase | AdvEngineID | IsArtifical | WindowClientWidth | WindowClientHeight | ClientTimeZone | ClientEventTime | SilverlightVersion1 | SilverlightVersion2 | SilverlightVersion3 | SilverlightVersio n4 | PageCharset | CodeVersion | IsLink | IsDownload | IsNotBounce | FUniqID | OriginalURL | HID | IsOldCounter | IsEvent | IsParameter | DontCountHits | WithHash | HitColor | LocalEventTime | Age | Sex | Income | Interests | Robotness | RemoteIP | WindowName | OpenerName | HistoryLength | BrowserLanguage | BrowserCountry | SocialNetwork | SocialAction | HTTPError | SendTiming | DNSTiming | ConnectTiming | ResponseStartTiming | ResponseEndTiming | FetchTiming | SocialSourceNetworkID | SocialSourcePage | ParamPrice | ParamOrderID | ParamCurrency | ParamCurrencyID | OpenstatServiceName | OpenstatCampaignID | OpenstatAdID | OpenstatSourceID | UTMSource | UTMMedium | UTMCampaign | UTMContent | UTMTerm | FromTag | HasGCLID | RefererHash | URLHash | CLID | +---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+------------------ ---+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+ | 8374547729199360385 | 1 | d0a2d0b5d181d1822028d0a0d0bed181d181d0b8d18f29202d20d0afd0bdd0b4d0b5d0bad181 | 1 | 1373893805 | 15901 | 62 | 1388530699 | 229 | 3217804679217022550 | 0 | 2 | 5 | 687474703a2f2f6972722e72752f696e6465782e7068703f73686f77616c62756d2f6c6f67696e2d6c656e697961373737373239342c393338333033313330 | 687474703a2f2f6b696e6f706f69736b2e72752f3f7374617465 | 0 | 10813 | 952 | 9500 | 520 | 1638 | 1658 | 37 | 15 | 7 | 373030 | 0 | 0 | 22 | 44efbfbd | 1 | 1 | 0 | 0 | | | 3830428 | -1 | 0 | | 0 | 0 | 1654 | 936 | 135 | 1373857827 | 4 | 1 | 16561 | 0 | 77696e646f7773 | 1601 | 0 | 0 | 0 | 8731137316151599477 | | 4563091 | 0 | 0 | 0 | 0 | 0 | 35 | 1373847066 | 0 | 0 | 0 | 0 | 0 | 1547096432 | -1 | -1 | -1 | 5330 | efbfbd0c | | | 0 | 0 | 0 | 190 | 987 | 55 | 35 | 0 | | 0 | | 4e481c | 0 | | | | | | | | | | | 0 | -1172318462146836803 | 2868770270353813622 | 0 | +---------------------+------------+------------------------------------------------------------------------------+-----------+------------+-----------+-----------+------------+----------+---------------------+--------------+----+-----------+--------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------+-----------+-------------------+-----------------+---------------+-------------+-----------------+------------------+-----------------+------------+------------+-------------+----------+----------+----------------+----------------+--------------+------------------+----------+-------------+------------------+--------+-------------+----------------+----------------+--------------+-------------+-------------+-------------------+--------------------+----------------+-----------------+---------------------+---------------------+---------------------+------------------ ---+----------------+-------------+--------+------------+-------------+---------------------+-------------+---------+--------------+---------+-------------+---------------+----------+----------+----------------+-----+-----+--------+-----------+-----------+------------+------------+------------+---------------+-----------------+----------------+---------------+--------------+-----------+------------+-----------+---------------+---------------------+-------------------+-------------+-----------------------+------------------+------------+--------------+---------------+-----------------+---------------------+--------------------+--------------+------------------+-----------+-----------+-------------+------------+---------+---------+----------+----------------------+---------------------+------+ 1 row(s) fetched. Elapsed 0.637 seconds. Object Store Profiling 2025-09-07T12:17:31.352590200+00:00 operation=Head duration=0.020476s path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.373586300+00:00 operation=Get duration=0.017040s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.373610700+00:00 operation=Get duration=0.036302s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.373727700+00:00 operation=Get duration=0.056047s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.373893500+00:00 operation=Get duration=0.056901s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.373614300+00:00 operation=Get duration=0.059208s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.373669600+00:00 operation=Get duration=0.061699s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.373679500+00:00 operation=Get duration=0.062037s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.373698900+00:00 operation=Get duration=0.064385s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.373688700+00:00 operation=Get duration=0.064557s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.390647700+00:00 operation=Get duration=0.050088s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.373669100+00:00 operation=Get duration=0.071315s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.373625700+00:00 operation=Get duration=0.071612s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.409928500+00:00 operation=Get duration=0.035328s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.430810100+00:00 operation=Get duration=0.019518s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.373877300+00:00 operation=Get duration=0.076532s size=8 range: bytes=174965036-174965043 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.429786100+00:00 operation=Get duration=0.021806s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.432826500+00:00 operation=Get duration=0.022541s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.435378800+00:00 operation=Get duration=0.020154s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.435720300+00:00 operation=Get duration=0.021886s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.438249100+00:00 operation=Get duration=0.020374s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.438117600+00:00 operation=Get duration=0.021973s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.445244800+00:00 operation=Get duration=0.019013s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.444999300+00:00 operation=Get duration=0.020614s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.441698900+00:00 operation=Get duration=0.024848s size=8134104 range: bytes=4-8134107 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.450412700+00:00 operation=Get duration=0.023832s size=34322 range: bytes=174930714-174965035 path=hits_compatible/athena_partitioned/hits_1.parquet 2025-09-07T12:17:31.465032100+00:00 operation=Get duration=0.021047s size=97589907 range: bytes=77340716-174930622 path=hits_compatible/athena_partitioned/hits_1.parquet Get Summary: count: 26 duration min: 0.017040s duration max: 0.076532s duration avg: 0.040025s size min: 8 B size max: 97589907 B size avg: 4082152 B size sum: 106135971 B Head Summary: count: 1 duration min: 0.020476s duration max: 0.020476s duration avg: 0.020476s ``` Its also cool to see that it works on the local object store, but the output appears duplicated: ```sql > select * from t limit 1; +---+---+ | k | v | +---+---+ | 1 | 1 | +---+---+ 1 row(s) fetched. Elapsed 0.002 seconds. Object Store Profiling 2025-09-07T12:26:34.064444500+00:00 operation=Head duration=0.000192s path=17266/datafusion-cli/t.parquet 2025-09-07T12:26:34.064800300+00:00 operation=Get duration=0.000099s size=72166 range: bytes=4-72169 path=17266/datafusion-cli/t.parquet Get Summary: count: 1 duration min: 0.000099s duration max: 0.000099s duration avg: 0.000099s size min: 72166 B size max: 72166 B size avg: 72166 B size sum: 72166 B Head Summary: count: 1 duration min: 0.000192s duration max: 0.000192s duration avg: 0.000192s 2025-09-07T12:26:34.064443400+00:00 operation=Head duration=0.000194s path=17266/datafusion-cli/t.parquet 2025-09-07T12:26:34.064799400+00:00 operation=Get duration=0.000100s size=72166 range: bytes=4-72169 path=17266/datafusion-cli/t.parquet Get Summary: count: 1 duration min: 0.000100s duration max: 0.000100s duration avg: 0.000100s size min: 72166 B size max: 72166 B size avg: 72166 B size sum: 72166 B Head Summary: count: 1 duration min: 0.000194s duration max: 0.000194s duration avg: 0.000194s ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
