[GitHub] [arrow] elgabbas commented on issue #34291: [R] Can not parse file

via GitHub Wed, 22 Feb 2023 05:19:26 -0800


elgabbas commented on issue #34291:
URL: https://github.com/apache/arrow/issues/34291#issuecomment-1440005031


   Thanks @eitsupi 
   
   As I mentioned in the previous message, it seems the problem is due to an 
extra non-necessary quotation.
   If I manually removed it (second example below: `Arrow_parse_Example5.txt`), 
I can load the data.
   
   ```
   # This failed
   Occ <- read_delim_arrow(file = 
"https://github.com/apache/arrow/files/10804095/Arrow_parse_Example4.txt";, 
delim = "\t")
   # Error in `read_delim_arrow()`: ! Invalid: CSV parse error: Row #3: 
Expected 3 columns, got 2: 2417934775   "TEXT1 ""Quoted"" TEXT2 49.6275
   
   # This works
   Occ <- read_delim_arrow(file = 
"https://github.com/apache/arrow/files/10804096/Arrow_parse_Example5.txt";, 
delim = "\t")
   ```
   The only difference between both files is the removal of extra double 
quotation.
   
   Using `quote = ""`, I was able to overcome this specific issue, but this is 
how the data look like now (which is not neat!):
   ```
   Occ <- read_delim_arrow(file = 
"https://github.com/apache/arrow/files/10804095/Arrow_parse_Example4.txt";, 
delim = "\t", quote = "")
   # A tibble: 2 × 3
   # V1 V2                                V3
   # 2417934775 "TEXT1\"\"NoQuoted\"\" TEXT2"   49.6
   # 2417934775 "\"TEXT1 \"\"Quoted\"\" TEXT2"  49.6
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] elgabbas commented on issue #34291: [R] Can not parse file

Reply via email to