dberenbaum commented on issue #43497: URL: https://github.com/apache/arrow/issues/43497#issuecomment-2359476905
Hi, thanks for looking into it. I think you need to setup your gcloud credentials. You should be able to do that by installing the `gcloud` CLI and then running `gcloud auth application-default login`. I can confirm that the above code from @pitrou works fine for me: ```python >>> import pyarrow.dataset as ds >>> uri = "gs://datachain-demo/laion-aesthetics-csv/laion_aesthetics_1024_33M_1.csv?retry_limit_seconds=5" >>> dataset = ds.dataset(uri, format="csv") >>> print(dataset.head(5)) pyarrow.Table URL: string TEXT: string WIDTH: double HEIGHT: double similarity: double punsafe: double pwatermark: double AESTHETIC_SCORE: double hash: int64 ---- URL: [["https://endscan.com/media/36373/fb0bf7b2abe7acfbcf95e2a180832e43.jpg","https://static0.colliderimages.com/wordpress/wp-content/uploads/2015/03/arrow-paleyfest-john-barrowman.jpg","https://images.squarespace-cdn.com/content/v1/5bc717b929f2cc0b619dbff7/1554986901865-64XLGVFH3L1V6W0JXMBS/ke17ZwdGBToddI8pDm48kMU-brTfAKJOFGpJx6cnIMl7gQa3H78H3Y0txjaiv_0fDoOvxcdMmMKkDsyUqMSsMWxHk725yiiHCCLfrh8O1z5QPOohDIaIeljMHgDF5CVlOqpeNLcJ80NK65_fV7S1UQqOSwhaf_yYOau3t15EsUQdsqpPwT44aNDfKbHVgxfV5-NaFAjT2bqbYOqLXVJa5A/Profile+of+a+Harris+Hawk.jpg","https://media.kohlsimg.com/is/image/kohls/2656691_ALT2?wid=1024&hei=1024&op_sharpen=1","https://4.bp.blogspot.com/-o67J2rH8_eM/UWSpsLkz3eI/AAAAAAAAD9U/5g2xqqTt8-w/s1600/Dwarf+Galaxy+Chart+2+small.jpg"]] TEXT: [["View 47 photos of this 3 bed, 4 bath, and 2,490 sqft. condo home located at 341 Mill St, Saint Paul, Minnesota 55102 is Active for $825,000.","john barrowman - photo #12","A black and white limited edition portraits of a Harris Hawk. Bird portraits in the Raptor series by fine art photographer Paul Coghlin.","Women's Converse Chuck Taylor All Star Madison Floral Lined Sneakers","Physicists of the Caribbean: Infographic : Dwarf Galaxy ..."]] WIDTH: [[2080,2011,1655,1024,1600]] HEIGHT: [[1388,3000,1055,1024,1600]] similarity: [[0.2859024107456207,0.31411105394363403,0.3719012141227722,0.2890952229499817,0.3397912383079529]] punsafe: [[0.00013357401,0.0062506497,0.0000014455641,0.00017145276,0.00019073486]] pwatermark: [[0.10044884,0.73289585,0.081598304,0.18584651,0.6887595]] AESTHETIC_SCORE: [[5.0400634,5.54457,5.9696546,5.014857,5.413278]] hash: [[929872200875109155,8338800302313723098,7578604913656441916,3451195012265564296,-4028870017594316595]] ``` In fact, as you can see from this code block, I am even able to successfully run `dataset.head()` in the REPL. However, running the same code in a script raises a seg fault. I have observed the same behavior in both Mac and Linux. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
