[ https://issues.apache.org/jira/browse/CSV-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Holger Stratmann updated CSV-131: --------------------------------- Attachment: PositionTrackingFull_v101_20140910.patch I convinced myself and created a new patch (containing patch and tests) that includes a way to set the character position in the parser. The returned records are now identical to the ones we get when we start reading at the beginning. Take a look and let me know which one you like better. > save positions of records to enable random access > ------------------------------------------------- > > Key: CSV-131 > URL: https://issues.apache.org/jira/browse/CSV-131 > Project: Commons CSV > Issue Type: Improvement > Components: Parser > Affects Versions: 1.1 > Reporter: Holger Stratmann > Priority: Minor > Attachments: PositionTrackingFull_v101_20140910.patch, > PositionTrackingTest_20140907.patch, PositionTracking_20140907.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > It would be good to have {{CSVRecord}} save its position in the source stream. > Reason: Knowing the position of the records would enable random access to > retrieve records from the source (after reading it once to build an index) if > the file is too large to be read into memory (or if we don't want to read the > full file to access a record in the middle). > Additional info: I have created a "random access csv reader" and a "csv > viewer" (Swing) for arbitrarily large CSV files. It requires one additional > scan of the file to build an index (multi-byte charsets supported). The index > can be saved to a file so it only needs to be built once. Because the lexer > uses a BufferedReader, we need "internal information" to know where each > record starts. > The change to "core" is minor: one field in {{CSVRecord}}s and some > associated methods to store the position. > Patch will be attached. > Code for random access (both UI and non-UI) will be proposed (and possibly > submitted) as a separate issue. It could also be an independent add-on but > requires this one little change to Commons CSV. -- This message was sent by Atlassian JIRA (v6.3.4#6332)