In molecular dynamics a popular format for writing out the positions of the
atoms in a system is the xyz file format (see:
http://en.wikipedia.org/wiki/XYZ_file_format and/or
http://www.ks.uiuc.edu/Research/vmd/plugins/molfile/xyzplugin.html). The
format allows for storing the positions of the atoms at different snapshots
in time (aka "time step"). You may have a few to millions of atoms in your
system and you may have thousands of time steps represented in the file.
It is easy to end up with a single file that is many GB in size. Here is
a shell command that will create a very simple, and very small, test file
(note that the positions of the atoms are completely unrealistic-they are
all sitting on top of each other)
perl -e 'open(F, ">>test1.xyz"); for( $t= 1; $t < 11; $t = $t +1){print F
"10\n\n"; for( $a = 1; $a < 11; $a = $a + 1 ){print F "C 0.000 0.000
0.0000\n";}}; close(F);'
Here is a shell command that will produce a more complicated file structure
(note that depending on who wrote the code that output the file there may
be other columns of data at the end of each row, also the number of decimal
places kept and the type of spacing between elements may change), this file
has a different number of atoms with each time step :
perl -e 'open(F, ">>test2.xyz"); for( $t= 1; $t < 5; $t = $t +1){my $s= $t
+ 10; print F "$s \n"; my $color = substr ("abcd efghij klmno pqrs tuv
wxyz", int(rand(10)), int(rand(10))); print F $color; print F "\n" ;for( $a
= 1; $a < (11 +$t); $a = $a + 1 ){print F "C 10.000000 10.00000
10.00000 $a\n";}}; close(F);'
perl -e 'open(F, ">>test2.xyz"); for( $t= 1; $t < 5; $t = $t +1){my $s= $t
+ 10; print F "$s \n"; myperl -e 'open(F, ">>test2.xyz"); for( $t= 1; $t <
5; $t = $t +1){my $s= $t + 10; print F "$s \n"; my
Ok, that is the background to get to my question. I need a way to parse
these files and group the lines into time steps. I currently have
something that works but only in cases where the file size is relatively
small-it reads the whole file into memory. I would like to use something
like iota that will allow me lazily parse the file and run reducers on the
data. Any help would be really appreciated.
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.