[GitHub] [arrow] thisisnic commented on issue #36807: [C++] Segfault when reading a Parquet file as a Dataset but not when read as an individual file

via GitHub Fri, 21 Jul 2023 04:18:46 -0700


thisisnic commented on issue #36807:
URL: https://github.com/apache/arrow/issues/36807#issuecomment-1645426872


   I tried again, and it read in fine without `head()`.  When I tried again 
with `head()` then `collect()` it read in the data successfully and then 
segfaulted immediately after.
   
   ```
   > open_dataset("/data/nyc-taxi/year=2016/month=11/part-0.parquet") %>% 
head() %>% collect()
   # A tibble: 6 × 22
     vendor_name pickup_datetime     dropoff_datetime    passenger_count
     <chr>       <dttm>              <dttm>                        <int>
   1 VTS         2016-11-10 20:14:06 2016-11-10 20:19:37               1
   2 VTS         2016-11-10 20:14:06 2016-11-10 20:43:31               1
   3 VTS         2016-11-10 20:14:06 2016-11-10 20:17:24               1
   4 VTS         2016-11-10 20:14:06 2016-11-10 20:20:12               1
   5 CMT         2016-11-10 20:14:07 2016-11-10 20:20:23               1
   6 CMT         2016-11-10 20:14:07 2016-11-10 21:13:19               2
   # ℹ 18 more variables: trip_distance <dbl>, pickup_longitude <dbl>,
   #   pickup_latitude <dbl>, rate_code <chr>, store_and_fwd <chr>,
   #   dropoff_longitude <dbl>, dropoff_latitude <dbl>, payment_type <chr>,
   #   fare_amount <dbl>, extra <dbl>, mta_tax <dbl>, tip_amount <dbl>,
   #   tolls_amount <dbl>, total_amount <dbl>, improvement_surcharge <dbl>,
   #   congestion_surcharge <dbl>, pickup_location_id <int>,
   #   dropoff_location_id <int>
   > 
   Thread 11 "R" received signal SIGSEGV, Segmentation fault.
   [Switching to Thread 0x7fffbffff640 (LWP 480578)]
   0x0000000000000000 in ?? ()
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] thisisnic commented on issue #36807: [C++] Segfault when reading a Parquet file as a Dataset but not when read as an individual file

Reply via email to