Re: [julia-users] Re: reading compressed csv file?

2015-01-05 Thread Steven G. Johnson


On Monday, January 5, 2015 12:43:16 AM UTC-5, Jiahao Chen wrote:

 This is how I used GZip.jl in the tests for the MatrixMarket package


In the present case, seems like it would be easier to do:

data = GZip.open(fname) do greadcsv(g)
  end 


Re: [julia-users] Re: reading compressed csv file?

2015-01-04 Thread elextr


On Monday, January 5, 2015 4:46:15 PM UTC+10, ivo welch wrote:

 dear tim, lex, todd (others):  thanks for responding.  I really want 
 to learn how to preprocess input from somewhere else into the 
 readcsv() function.  it's a good starting exercise for me to learn how 
 to accomplish tasks in general.  there is so much to learn.  [I did 
 not experiment with GZip.jl --- modules are new to me, and this one is 
 not included.  I could make too many errors in this process.  It will 
 probably make the specific task easier.] 

 now, the first mistake which tripped me up for a while is that I did 
 not grasp the difference between a string and a command.  that is, I 
 should not have used  for my command.  I had needed to use `.  this 
 is why open(echo hi) did not work, but open(`echo hi`) does. 


Yep correct.
 


 x=open(`gzcat myfile.csv.gz`) 

 is a good start.  I see it contains a tuple of a Pipe and a Process. 
 this is printed by default on the command line.  I learned I can make 
 this work with 

d=readcsv( x[1] ) 


Yes
 


 but I have a whole bunch of new questions, beyond question now. 
 first, try this: 

 julia x1=open(`gzcat d.csv.gz`) 
 (Pipe(closed, 35 bytes waiting),Process(`gzcat d.csv.gz`, 
 ProcessExited(0))) 

 julia x2=open(`gzcat d.csv.gz`) 
 (Pipe(active, 0 bytes waiting),Process(`gzcat d.csv.gz`, ProcessRunning)) 

 how strange---the claims are different.  


That may just be sampling effect, the gzcat is being run in another process 
so it runs at the same time as the current process.  Also see below for why 
the first call to open(command) may have been slower than the second and so 
the open has not completed until after the other process completed, but ran 
much faster the second time and beat the other process.
 

 even stranger, the first 
 readcsv(x2[1]) is very slow now (I am talking 3 seconds on a 3 by 4 
 data file!); but following it with readcsv(x1[1]) is fast.  I can't 
 imagine readcsv has intelligence built-in to cache past specific 
 conversions. 


No but the first time you do anything its possible that you are hitting 
compile delays from the JIT (of open and readcsv and all its dependents), 
subsequent runs are faster. 
 


 another strange definition from a novice perspective:  close(x1) is 
 not defined.  close(x1[1]) is.  


close() is defined for a stream, not a tuple (stream, process).
 

 julia is the first language I have 
 seen where a close(open(file)) is wrong. 


close(open(filenamestring)) is fine, close(open(command)) is not because 
open(command) returns a tuple of two things, not just the stream.  This is 
Julia's primary paradigm, multi-dispatch means that the same named function 
can have several methods that do different things depending on the *type* 
of the arguments to the call, string or command.
 

  this is esp surprising 
 because julia has the dispatch ability to understand what it could do 
 with a close(Pipe,Process) tuple. 


But only if such a close() method is defined, which it is not.  Maybe it 
should be, but open(command) is significantly less used than open(file).

Cheers
Lex

 

  the same holds true for other 
 functions that expect a part of open.  julia should be smart enough to 
 know this. 

 regards, 

 /iaw 

  
 Ivo Welch (ivo@gmail.com javascript:) 
 http://www.ivo-welch.info/ 
 J. Fred Weston Distinguished Professor of Finance 
 Anderson School at UCLA, C519 
 Director, UCLA Anderson Fink Center for Finance and Investments 
 Free Finance Textbook, http://book.ivo-welch.info/ 
 Exec Editor, Critical Finance Review, 
 http://www.critical-finance-review.org/ 
 Editor and Publisher, FAMe, http://www.fame-jagazine.com/ 


 On Sun, Jan 4, 2015 at 6:29 PM, Todd Leo sliznm...@gmail.com 
 javascript: wrote: 
  An intuitive thought is, uncompress your csv file via bash utility zcat, 
  pipe it to STDIN and use readline(STDIN) in julia. 
  
  
  
  On Monday, January 5, 2015 7:51:18 AM UTC+8, ivo welch wrote: 
  
  
  dear julia users:  beginner's question (apologies, more will be 
 coming). 
  it's probably obvious. 
  
  I am storing files in compressed csv form.  I want to use the built-in 
  julia readcsv() function.  but I also need to pipe through a 
 decompressor 
  first.  so, I tried a variety of forms, like 
  
 d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |) 
 d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`) 
  
  I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but 
  wrapping a readcsv around it does not capture it.  how does one do 
 this? 
  
  regards, 
  
  /iaw 
  
  



[julia-users] Re: reading compressed csv file?

2015-01-04 Thread elextr


On Monday, January 5, 2015 9:51:18 AM UTC+10, ivo welch wrote:


 dear julia users:  beginner's question (apologies, more will be coming). 
  it's probably obvious.

 I am storing files in compressed csv form.  I want to use the built-in 
 julia readcsv() function.  but I also need to pipe through a decompressor 
 first.  so, I tried a variety of forms, like

d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |)
d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`)

 I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but 
 wrapping a readcsv around it does not capture it.  how does one do this?


Can you run the command with 
open() 
http://docs.julialang.org/en/latest/stdlib/base/?highlight=spawn#Base.open 
and pass the stream it returns to readcsv?

Cheers
Lex

 


 regards,

 /iaw



Re: [julia-users] Re: reading compressed csv file?

2015-01-04 Thread ivo welch
still not obviois.  readcsv does have a dispatch for a stream (good),
but I really need a popen function.
  x=readcsv(open(`gzcat myfile.csv.gz`, r))
is wrong.  x=run(`gzcat myfiles.csv.gz`) doesn't send the output to x
for further piping as far as I can see, so readcsv(x) doesn't do it.

/iaw


Ivo Welch (ivo.we...@gmail.com)
http://www.ivo-welch.info/

Ivo Welch (ivo.we...@gmail.com)
http://www.ivo-welch.info/
J. Fred Weston Distinguished Professor of Finance
Anderson School at UCLA, C519
Director, UCLA Anderson Fink Center for Finance and Investments
Free Finance Textbook, http://book.ivo-welch.info/
Exec Editor, Critical Finance Review, http://www.critical-finance-review.org/
Editor and Publisher, FAMe, http://www.fame-jagazine.com/


On Sun, Jan 4, 2015 at 4:55 PM,  ele...@gmail.com wrote:


 On Monday, January 5, 2015 9:51:18 AM UTC+10, ivo welch wrote:


 dear julia users:  beginner's question (apologies, more will be coming).
 it's probably obvious.

 I am storing files in compressed csv form.  I want to use the built-in
 julia readcsv() function.  but I also need to pipe through a decompressor
 first.  so, I tried a variety of forms, like

d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |)
d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`)

 I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but
 wrapping a readcsv around it does not capture it.  how does one do this?


 Can you run the command with open()
 http://docs.julialang.org/en/latest/stdlib/base/?highlight=spawn#Base.open
 and pass the stream it returns to readcsv?

 Cheers
 Lex




 regards,

 /iaw




Re: [julia-users] Re: reading compressed csv file?

2015-01-04 Thread Tim Holy
I wonder if the GZip.jl package would help?

--Tim

On Sunday, January 04, 2015 05:11:50 PM ivo welch wrote:
 still not obviois.  readcsv does have a dispatch for a stream (good),
 but I really need a popen function.
   x=readcsv(open(`gzcat myfile.csv.gz`, r))
 is wrong.  x=run(`gzcat myfiles.csv.gz`) doesn't send the output to x
 for further piping as far as I can see, so readcsv(x) doesn't do it.
 
 /iaw
 
 
 Ivo Welch (ivo.we...@gmail.com)
 http://www.ivo-welch.info/
 
 Ivo Welch (ivo.we...@gmail.com)
 http://www.ivo-welch.info/
 J. Fred Weston Distinguished Professor of Finance
 Anderson School at UCLA, C519
 Director, UCLA Anderson Fink Center for Finance and Investments
 Free Finance Textbook, http://book.ivo-welch.info/
 Exec Editor, Critical Finance Review,
 http://www.critical-finance-review.org/ Editor and Publisher, FAMe,
 http://www.fame-jagazine.com/
 
 On Sun, Jan 4, 2015 at 4:55 PM,  ele...@gmail.com wrote:
  On Monday, January 5, 2015 9:51:18 AM UTC+10, ivo welch wrote:
  dear julia users:  beginner's question (apologies, more will be coming).
  it's probably obvious.
  
  I am storing files in compressed csv form.  I want to use the built-in
  julia readcsv() function.  but I also need to pipe through a decompressor
  first.  so, I tried a variety of forms, like
  
 d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |)
 d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`)
  
  I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but
  wrapping a readcsv around it does not capture it.  how does one do this?
  
  Can you run the command with open()
  http://docs.julialang.org/en/latest/stdlib/base/?highlight=spawn#Base.open
  and pass the stream it returns to readcsv?
  
  Cheers
  Lex
  
  regards,
  
  /iaw



Re: [julia-users] Re: reading compressed csv file?

2015-01-04 Thread elextr


On Monday, January 5, 2015 11:12:13 AM UTC+10, ivo welch wrote:

 still not obviois.  readcsv does have a dispatch for a stream (good), 
 but I really need a popen function. 
   x=readcsv(open(`gzcat myfile.csv.gz`, r)) 
 is wrong.  x=run(`gzcat myfiles.csv.gz`) doesn't send the output to x 
 for further piping as far as I can see, so readcsv(x) doesn't do it. 


The documentation I linked said:

open(*command*, *mode::AbstractString=r*, *stdio=DevNull*)

Start running command asynchronously, and return a tuple (stream,process)
 you need to pass the stream element of the tuple to readcsv()

Cheers
Lex


 /iaw 

  
 Ivo Welch (ivo@gmail.com javascript:) 
 http://www.ivo-welch.info/ 
  
 Ivo Welch (ivo@gmail.com javascript:) 
 http://www.ivo-welch.info/ 
 J. Fred Weston Distinguished Professor of Finance 
 Anderson School at UCLA, C519 
 Director, UCLA Anderson Fink Center for Finance and Investments 
 Free Finance Textbook, http://book.ivo-welch.info/ 
 Exec Editor, Critical Finance Review, 
 http://www.critical-finance-review.org/ 
 Editor and Publisher, FAMe, http://www.fame-jagazine.com/ 


 On Sun, Jan 4, 2015 at 4:55 PM,  ele...@gmail.com javascript: wrote: 
  
  
  On Monday, January 5, 2015 9:51:18 AM UTC+10, ivo welch wrote: 
  
  
  dear julia users:  beginner's question (apologies, more will be 
 coming). 
  it's probably obvious. 
  
  I am storing files in compressed csv form.  I want to use the built-in 
  julia readcsv() function.  but I also need to pipe through a 
 decompressor 
  first.  so, I tried a variety of forms, like 
  
 d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |) 
 d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`) 
  
  I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but 
  wrapping a readcsv around it does not capture it.  how does one do 
 this? 
  
  
  Can you run the command with open() 
  
 http://docs.julialang.org/en/latest/stdlib/base/?highlight=spawn#Base.open 
  and pass the stream it returns to readcsv? 
  
  Cheers 
  Lex 
  
  
  
  
  regards, 
  
  /iaw 
  
  



[julia-users] Re: reading compressed csv file?

2015-01-04 Thread Todd Leo
An intuitive thought is, uncompress your csv file via bash utility *zcat*, 
pipe it to STDIN and use* readline(STDIN) *in julia.


On Monday, January 5, 2015 7:51:18 AM UTC+8, ivo welch wrote:


 dear julia users:  beginner's question (apologies, more will be coming). 
  it's probably obvious.

 I am storing files in compressed csv form.  I want to use the built-in 
 julia readcsv() function.  but I also need to pipe through a decompressor 
 first.  so, I tried a variety of forms, like

d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |)
d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`)

 I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but 
 wrapping a readcsv around it does not capture it.  how does one do this?

 regards,

 /iaw



Re: [julia-users] Re: reading compressed csv file?

2015-01-04 Thread Jiahao Chen
This is how I used GZip.jl in the tests for the MatrixMarket package

https://github.com/JuliaSparse/MatrixMarket.jl/blob/ba60e447f24938952509bb42c6d6bf9223562ef8/test/dl-matrixmarket.jl#L7

Perhaps it might be useful for you.

Thanks,

Jiahao Chen
Staff Research Scientist
MIT Computer Science and Artificial Intelligence Laboratory

On Sun, Jan 4, 2015 at 9:29 PM, Todd Leo sliznmail...@gmail.com wrote:

 An intuitive thought is, uncompress your csv file via bash utility *zcat*,
 pipe it to STDIN and use* readline(STDIN) *in julia.



 On Monday, January 5, 2015 7:51:18 AM UTC+8, ivo welch wrote:


 dear julia users:  beginner's question (apologies, more will be coming).
  it's probably obvious.

 I am storing files in compressed csv form.  I want to use the built-in
 julia readcsv() function.  but I also need to pipe through a decompressor
 first.  so, I tried a variety of forms, like

d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |)
d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`)

 I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but
 wrapping a readcsv around it does not capture it.  how does one do this?

 regards,

 /iaw




Re: [julia-users] Re: reading compressed csv file?

2015-01-04 Thread ivo welch
dear tim, lex, todd (others):  thanks for responding.  I really want
to learn how to preprocess input from somewhere else into the
readcsv() function.  it's a good starting exercise for me to learn how
to accomplish tasks in general.  there is so much to learn.  [I did
not experiment with GZip.jl --- modules are new to me, and this one is
not included.  I could make too many errors in this process.  It will
probably make the specific task easier.]

now, the first mistake which tripped me up for a while is that I did
not grasp the difference between a string and a command.  that is, I
should not have used  for my command.  I had needed to use `.  this
is why open(echo hi) did not work, but open(`echo hi`) does.

x=open(`gzcat myfile.csv.gz`)

is a good start.  I see it contains a tuple of a Pipe and a Process.
this is printed by default on the command line.  I learned I can make
this work with

   d=readcsv( x[1] )

but I have a whole bunch of new questions, beyond question now.
first, try this:

julia x1=open(`gzcat d.csv.gz`)
(Pipe(closed, 35 bytes waiting),Process(`gzcat d.csv.gz`, ProcessExited(0)))

julia x2=open(`gzcat d.csv.gz`)
(Pipe(active, 0 bytes waiting),Process(`gzcat d.csv.gz`, ProcessRunning))

how strange---the claims are different.  even stranger, the first
readcsv(x2[1]) is very slow now (I am talking 3 seconds on a 3 by 4
data file!); but following it with readcsv(x1[1]) is fast.  I can't
imagine readcsv has intelligence built-in to cache past specific
conversions.

another strange definition from a novice perspective:  close(x1) is
not defined.  close(x1[1]) is.  julia is the first language I have
seen where a close(open(file)) is wrong.  this is esp surprising
because julia has the dispatch ability to understand what it could do
with a close(Pipe,Process) tuple.  the same holds true for other
functions that expect a part of open.  julia should be smart enough to
know this.

regards,

/iaw


Ivo Welch (ivo.we...@gmail.com)
http://www.ivo-welch.info/
J. Fred Weston Distinguished Professor of Finance
Anderson School at UCLA, C519
Director, UCLA Anderson Fink Center for Finance and Investments
Free Finance Textbook, http://book.ivo-welch.info/
Exec Editor, Critical Finance Review, http://www.critical-finance-review.org/
Editor and Publisher, FAMe, http://www.fame-jagazine.com/


On Sun, Jan 4, 2015 at 6:29 PM, Todd Leo sliznmail...@gmail.com wrote:
 An intuitive thought is, uncompress your csv file via bash utility zcat,
 pipe it to STDIN and use readline(STDIN) in julia.



 On Monday, January 5, 2015 7:51:18 AM UTC+8, ivo welch wrote:


 dear julia users:  beginner's question (apologies, more will be coming).
 it's probably obvious.

 I am storing files in compressed csv form.  I want to use the built-in
 julia readcsv() function.  but I also need to pipe through a decompressor
 first.  so, I tried a variety of forms, like

d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |)
d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`)

 I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but
 wrapping a readcsv around it does not capture it.  how does one do this?

 regards,

 /iaw