Re: parsing and chunking large xyz files

Jony Hudson Fri, 26 Dec 2014 07:33:43 -0800

I think clojure.csv reads CSV files lazily, line-by-line, so might be 
useful to take a look at:


https://github.com/clojure/data.csv


Jony

On Friday, 26 December 2014 14:49:59 UTC, cej38 wrote:
>
> In molecular dynamics a popular format for writing out the positions of 
> the atoms in a system is the xyz file format (see: 
> http://en.wikipedia.org/wiki/XYZ_file_format and/or 
> http://www.ks.uiuc.edu/Research/vmd/plugins/molfile/xyzplugin.html).  The 
> format allows for storing the positions of the atoms at different snapshots 
> in time (aka "time step").  You may have a few to millions of atoms in your 
> system and you may have thousands of time steps represented in the file. 
>  It is easy to end up with a single file that is many GB in size.  Here is 
> a shell command that will create a very simple, and very small, test file 
> (note that the positions of the atoms are completely unrealistic-they are 
> all sitting on top of each other)
>
> perl -e 'open(F, ">>test1.xyz"); for( $t= 1; $t < 11; $t = $t +1){print F 
> "10\n\n"; for( $a = 1; $a < 11; $a = $a + 1 ){print F "C  0.000 0.000 
> 0.0000\n";}}; close(F);'
>
>
> Here is a shell command that will produce a more complicated file 
> structure (note that depending on who wrote the code that output the file 
> there may be other columns of data at the end of each row, also the number 
> of decimal places kept and the type of spacing between elements may 
> change), this file has a different number of atoms with each time step :
>
> perl -e 'open(F, ">>test2.xyz"); for( $t= 1; $t < 5; $t = $t +1){my $s= $t 
> + 10; print F "$s \n"; my $color  = substr ("abcd efghij klmno pqrs tuv 
> wxyz", int(rand(10)), int(rand(10))); print F $color; print F "\n" ;for( $a 
> = 1; $a < (11 +$t); $a = $a + 1 ){print F "C    10.000000   10.00000   
> 10.00000   $a\n";}}; close(F);'
> perl -e 'open(F, ">>test2.xyz"); for( $t= 1; $t < 5; $t = $t +1){my $s= $t 
> + 10; print F "$s \n"; myperl -e 'open(F, ">>test2.xyz"); for( $t= 1; $t < 
> 5; $t = $t +1){my $s= $t + 10; print F "$s \n"; my
>
> Ok, that is the background to get to my question.  I need a way to parse 
> these files and group the lines into time steps.  I currently have 
> something that works but only in cases where the file size is relatively 
> small-it reads the whole file into memory.  I would like to use something 
> like iota that will allow me lazily parse the file and run reducers on the 
> data.  Any help would be really appreciated.
>
>
>
>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: parsing and chunking large xyz files

Reply via email to