jimjam-slam commented on issue #45901:
URL: https://github.com/apache/arrow/issues/45901#issuecomment-2785340576

   @amoeba Apologies for the slow reply! For R users, `{readr}` supports ragged 
CSVs — it throws a warning for the short rows but still fills them with a 
type-specific `NA`:
   
   ```r
   csv_string <- "name,group,score
   North,A,5
   East,A
   West,B,7
   South
   "
   
   df <- readr::read_csv(csv_string)
   #
   # Rows: 4 Columns: 3
   # ── Column specification 
──────────────────────────────────────────────────────────────
   # Delimiter: ","
   # chr (2): name, group
   # dbl (1): score
   # 
   # ℹ Use `spec()` to retrieve the full column specification for this data.
   # ℹ Specify the column types or set `show_col_types = FALSE` to quiet this 
message.
   # Warning message:
   # One or more parsing issues, call `problems()` on your data frame for 
details, e.g.:
   #   dat <- vroom(...)
   #   problems(dat)
   
   df
   # # A tibble: 4 × 3
   #   name  group score
   #   <chr> <chr> <dbl>
   # 1 North A         5
   # 2 East  A        NA
   # 3 West  B         7
   # 4 South NA       NA
   
   readr::problems(df)
   # # A tibble: 2 × 5
   #     row   col expected  actual    file
   #   <int> <int> <chr>     <chr>     <chr>
   # 1     3     2 3 columns 2 columns 
/private/var/folders/v3/ktxzq5ks2cz4xbvn975sp…
   # 2     5     1 3 columns 1 columns 
/private/var/folders/v3/ktxzq5ks2cz4xbvn975sp…
   ```
   
   The `{readr}` package also has `melt_csv()` that is specifically designed 
for ragged data (the function has been superseded and moved into the `{meltr}` 
package but currently still remains in `{readr}`):
   
   ```r
   df2 <- readr::melt_csv(csv_string)
   # Warning message:
   # `melt_csv()` was deprecated in readr 2.0.0.
   # ℹ Please use `meltr::melt_csv()` instead
   # This warning is displayed once every 8 hours.
   # Call `lifecycle::last_lifecycle_warnings()` to see where this warning was 
generated.
   
   df2
   # # A tibble: 12 × 4
   #      row   col data_type value
   #    <dbl> <dbl> <chr>     <chr>
   #  1     1     1 character name
   #  2     1     2 character group
   #  3     1     3 character score
   #  4     2     1 character North
   #  5     2     2 character A
   #  6     2     3 integer   5
   #  7     3     1 character East
   #  8     3     2 character A
   #  9     4     1 character West
   # 10     4     2 character B
   # 11     4     3 integer   7
   # 12     5     1 character South
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to