Taewoo helped me look into the issue. To finish this discussion, it was because I was using an old Asterix version. The current master branch can parse CSV files properly.
Chen On Sun, Jul 26, 2015 at 11:25 PM, Taewoo Kim <[email protected]> wrote: > @Chen: the format of your data file is not correct. In fact, after the > delimiter (,), the quote should be followed based on CSV RFC. However, in > your example, a white space exists. In fact, I saw the following error > message, which complains about the file format. After removing a white > space after the delimiter, it worked fine. So, if you correct the file > format, it should work. > > At record: 1, field#: 2 - a quote enclosing a field needs to be placed in > the beginning of that field. [IOException] > > > [ { "id": 14i32, "authors": "John Smith, Mary Reeve" } > ] > > > > Best, > Taewoo > > On Sun, Jul 26, 2015 at 10:47 PM, Chen Li <[email protected]> wrote: > >> I added the following line >> >> ("quote"="\"") >> >> to the load statement, but the problem remains: it mistakenly used the >> "," in the "authors" field to break the record. >> >> @Taewoo: can you try the simple AQL example I included in this thread >> to see if it can parse the quoted field correctly? >> >> Chen >> >> On Sun, Jul 26, 2015 at 1:25 PM, Taewoo Kim <[email protected]> wrote: >> > We have test cases for this case. There are located in >> > asterix-app/src/test/resources/runtimets/queries/load/. The >> documentation >> > is in the /asterix-doc/src/site/markdown/csv.md. Addtional syntax for >> the >> > CSV is fairly simple. You just have two additional parameters - "quote" >> and >> > "header". Refer to the file for more details. >> > >> > >> > >> > Best, >> > Taewoo >> > >> > On Sat, Jul 25, 2015 at 11:30 PM, Chen Li <[email protected]> wrote: >> > >> >> @Taewoo: I tried it and it has the same problem. Do you have a test >> >> case for this feature? Also do we have documentation for this syntax? >> >> >> >> Chen >> >> >> >> On Sat, Jul 25, 2015 at 10:52 PM, Taewoo Kim <[email protected]> >> wrote: >> >> > The URL is >> https://asterixdb.ics.uci.edu/documentation/aql/primer.html. >> >> > >> >> > >> >> > It should look like this: >> >> > >> >> > //// >> >> > use dataverse pubs; >> >> > >> >> > create type PaperType as open { >> >> > id: int32, >> >> > authors: string >> >> > } >> >> > >> >> > create dataset Papers(PaperType) primary key id; >> >> > >> >> > load dataset Papers using localfs >> >> > using localfs >> >> > (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"), >> >> > ("format"="delimited-text"), >> >> > ("delimiter"=",")); >> >> > >> >> > for $paper in dataset('Papers') >> >> > return $paper; >> >> > >> >> > >> >> > >> >> > Best, >> >> > Taewoo >> >> > >> >> > On Sat, Jul 25, 2015 at 10:47 PM, Chen Li <[email protected]> wrote: >> >> > >> >> >> @Taewoo: can you send me the syntax or the documentation URL to show >> the >> >> >> syntax? >> >> >> >> >> >> Chen >> >> >> >> >> >> On Sat, Jul 25, 2015 at 3:27 PM, Taewoo Kim <[email protected]> >> wrote: >> >> >> > Can you try to load it into an internal dataset? I think I have >> >> >> implemented >> >> >> > the "comma between the comma (delimiter)" when modifying the >> delimited >> >> >> data >> >> >> > parser. And Chris also modified that part, too. If it doesn't >> work, I >> >> can >> >> >> > look at the issue. >> >> >> > >> >> >> > Best, >> >> >> > Taewoo >> >> >> > >> >> >> > On Sat, Jul 25, 2015 at 1:51 PM, Chen Li <[email protected]> wrote: >> >> >> > >> >> >> >> Not sure if this topic was discussed before. I was trying to >> load an >> >> >> >> external CVS file using "," as the delimiter. But the engine >> failed >> >> to >> >> >> >> read a file with the following single record: >> >> >> >> >> >> >> >> 14, "John Smith, Mary Reeve" >> >> >> >> >> >> >> >> >> >> >> >> use dataverse pubs; >> >> >> >> >> >> >> >> create type PaperType as open { >> >> >> >> id: int32, >> >> >> >> authors: string >> >> >> >> } >> >> >> >> >> >> >> >> create external dataset Papers(PaperType) >> >> >> >> using localfs >> >> >> >> (("path"="127.0.01:///Users/chenli/tmp/asterix-data/papers.csv"), >> >> >> >> ("format"="delimited-text"), >> >> >> >> ("delimiter"=",")); >> >> >> >> >> >> >> >> for $paper in dataset('Papers') >> >> >> >> return $paper; >> >> >> >> >> >> >> >> The following is the output, which shows that the comma in the >> >> authors >> >> >> >> field was incorrectly used to break the field. Any idea about >> how to >> >> >> fix >> >> >> >> it? >> >> >> >> >> >> >> >> Output >> >> >> >> Results: >> >> >> >> >> >> >> >> { "id": 14, "authors": " \"John Smith" } >> >> >> >> >> >> >> >> Duration of all jobs: 0.091 sec >> >> >> >> >> >> >> >> Success: Query Complete >> >> >> >> >> >> >> >> >> >>
