Any char-based Java file I/O with arbitrary seek?

Andy Fingerhut Thu, 05 Jan 2012 14:07:41 -0800

If this doesn't seem like a question for a Clojure group, I'll preface this
by saying it is motivated by writing Clojure examples for a "Clojure
cookbook" [1].  So far the examples are intended to work like the Perl
examples from the 1st edition of the Perl Cookbook [2], but it may grow
beyond that some day (e.g. its own text, examples specific to Clojure
constructs that have no direct analog in Perl).  In particular, there are
examples there for doing random access on files in Chapter 8, and I was
wondering whether I'm on the right track.


I know about the class RandomAccessFile [3], which provides byte-oriented
I/O on a file with the ability to tell your current byte position, or seek
to a specified byte position.

I know that some of the subclasses of Reader have mark and reset methods,
but those appear to have implementation-specific limitations on how far
back you can go, and only let you mark one position.  I'm interested in
something that lets you jump anywhere.

Is the only way with built-in Java classes as follows?  Use a
RandomAccessFile for opening and manipulating the file.  When you want to
read a character or string, read enough bytes into a byte array and use the
String constructor with signature String(byte[] bytes, int offset, int
length, String charsetName) to convert it to characters, or the
CharsetDecoder class if you want more control over the details.  When you
want to write a string, use String's getBytes(String charsetName) method to
convert it to a byte array and then write that byte array to the file.

Are there  other open source Java or Clojure libraries that can do this?

If you know you are working with a fixed width character encoding like
ASCII or ISO-8859, this seems relatively straightforward.

I realize that with variable-length multi-byte character encodings like
UTF-8, it would be a bad idea to seek to a random byte position and start
trying to decode a UTF-8 character starting at that byte position.  I'm
thinking of cases where you have an index of byte positions of interest you
want to jump to in the future that are known to be the first byte of a
character in the appropriate encoding.  I also realize that one must be
very cautious in writing to the middle of such a file, since byte lengths
of strings are variable.  In general, restricting writing only to
appending, or forgetting this idea altogether and using a database, are
preferable in most cases.

Thanks,
Andy

[1] https://github.com/jafingerhut/pleac-clojure forked from
https://github.com/mbacarella/pleac-clojure   It is easier to understand
what the examples are intended to do if you read along with the text of the
1st ed Perl Cookbook.

[2] The 2003 2nd edition is here:
http://www.amazon.com/Perl-Cookbook-Second-Tom-Christiansen/dp/0596003137/ref=sr_1_1?ie=UTF8&qid=1325800290&sr=8-1
It has some section numbers and examples different from 1st ed, which you
can see on line here: http://docstore.mik.ua/orelly/perl/cookbook/

[3] http://docs.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Any char-based Java file I/O with arbitrary seek?

Reply via email to