Hi Alan,
On Friday, March 8, 2013 4:02:18 PM UTC+1, Alan Busby wrote: > > Hi Bernard, > > I'd certainly like to add support for binary files, but as I haven't had a > need for it myself I haven't had a good place to start. > > As Java NIO's mmap() doesn't support ranges over 2GB, I've had to paste > together multiple mmap's to cover files that are larger than 2GB. > So if a record ended up spanning two mmap()'s, you couldn't return the raw > data as a single object without copying it into a new buffer first. > > Also, if you provide a fixed record size in bytes for "doing the idx > offset maths", why do you need the end idx for the current line as well? > For example if you say file.bin is full of records each 100B in size, and > you ask for the 10th record; don't you already know that the length of the > record is 100B? > > Indeed, the correlation between txt/binary and char (i.e \n) delimited/fixed length record is very strong. However in my case I want to first handle a \n delimited (txt) file as binary for performance reasons. The context is that I have to consider all the lines of data, but might not have to do "heavy" processing on all of them, so I want to do as few work as possible on each line (i.e. not construct any java.lang.String). This is in no way Clojure specific, I have two implementations in Java of a small Minimum Spanning Tree program : - one is constructing Strings from all the lines: https://www.refheap.com/paste/12312 - one is using offsets from a raw ByteBuffer : https://www.refheap.com/paste/12313 As most of the lines are not really processed (just sorted according to the last field), being able to only peek at the relevant bytes instead of constructing full blown java.lang.Strings is a huge performance boost. FWIW, as far as performance i concerned, I draw the line not between Clojure and Java but between objects (constructed by copying some data somewhere on the heap) and arrays of primitive data types, because nowadays, cache locality trumps everything (once you got rid of reflection calls in Clojure, obviously). So ideally, maybe 2 x 2 combinations (String / offset in ByteArray) x (char delimited / fixed length) would be needed to cover all the needs. Thanks again for sharing your library ! Cheers, Bernard PS: Is there a rationale for returning nil instead of empty String "" on empty lines with iota/vec? -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.