Re: [julia-users] Re: reading compressed csv file?
On Monday, January 5, 2015 12:43:16 AM UTC-5, Jiahao Chen wrote: This is how I used GZip.jl in the tests for the MatrixMarket package In the present case, seems like it would be easier to do: data = GZip.open(fname) do greadcsv(g) end
Re: [julia-users] Re: reading compressed csv file?
On Monday, January 5, 2015 4:46:15 PM UTC+10, ivo welch wrote: dear tim, lex, todd (others): thanks for responding. I really want to learn how to preprocess input from somewhere else into the readcsv() function. it's a good starting exercise for me to learn how to accomplish tasks in general. there is so much to learn. [I did not experiment with GZip.jl --- modules are new to me, and this one is not included. I could make too many errors in this process. It will probably make the specific task easier.] now, the first mistake which tripped me up for a while is that I did not grasp the difference between a string and a command. that is, I should not have used for my command. I had needed to use `. this is why open(echo hi) did not work, but open(`echo hi`) does. Yep correct. x=open(`gzcat myfile.csv.gz`) is a good start. I see it contains a tuple of a Pipe and a Process. this is printed by default on the command line. I learned I can make this work with d=readcsv( x[1] ) Yes but I have a whole bunch of new questions, beyond question now. first, try this: julia x1=open(`gzcat d.csv.gz`) (Pipe(closed, 35 bytes waiting),Process(`gzcat d.csv.gz`, ProcessExited(0))) julia x2=open(`gzcat d.csv.gz`) (Pipe(active, 0 bytes waiting),Process(`gzcat d.csv.gz`, ProcessRunning)) how strange---the claims are different. That may just be sampling effect, the gzcat is being run in another process so it runs at the same time as the current process. Also see below for why the first call to open(command) may have been slower than the second and so the open has not completed until after the other process completed, but ran much faster the second time and beat the other process. even stranger, the first readcsv(x2[1]) is very slow now (I am talking 3 seconds on a 3 by 4 data file!); but following it with readcsv(x1[1]) is fast. I can't imagine readcsv has intelligence built-in to cache past specific conversions. No but the first time you do anything its possible that you are hitting compile delays from the JIT (of open and readcsv and all its dependents), subsequent runs are faster. another strange definition from a novice perspective: close(x1) is not defined. close(x1[1]) is. close() is defined for a stream, not a tuple (stream, process). julia is the first language I have seen where a close(open(file)) is wrong. close(open(filenamestring)) is fine, close(open(command)) is not because open(command) returns a tuple of two things, not just the stream. This is Julia's primary paradigm, multi-dispatch means that the same named function can have several methods that do different things depending on the *type* of the arguments to the call, string or command. this is esp surprising because julia has the dispatch ability to understand what it could do with a close(Pipe,Process) tuple. But only if such a close() method is defined, which it is not. Maybe it should be, but open(command) is significantly less used than open(file). Cheers Lex the same holds true for other functions that expect a part of open. julia should be smart enough to know this. regards, /iaw Ivo Welch (ivo@gmail.com javascript:) http://www.ivo-welch.info/ J. Fred Weston Distinguished Professor of Finance Anderson School at UCLA, C519 Director, UCLA Anderson Fink Center for Finance and Investments Free Finance Textbook, http://book.ivo-welch.info/ Exec Editor, Critical Finance Review, http://www.critical-finance-review.org/ Editor and Publisher, FAMe, http://www.fame-jagazine.com/ On Sun, Jan 4, 2015 at 6:29 PM, Todd Leo sliznm...@gmail.com javascript: wrote: An intuitive thought is, uncompress your csv file via bash utility zcat, pipe it to STDIN and use readline(STDIN) in julia. On Monday, January 5, 2015 7:51:18 AM UTC+8, ivo welch wrote: dear julia users: beginner's question (apologies, more will be coming). it's probably obvious. I am storing files in compressed csv form. I want to use the built-in julia readcsv() function. but I also need to pipe through a decompressor first. so, I tried a variety of forms, like d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |) d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`) I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but wrapping a readcsv around it does not capture it. how does one do this? regards, /iaw
[julia-users] Re: reading compressed csv file?
On Monday, January 5, 2015 9:51:18 AM UTC+10, ivo welch wrote: dear julia users: beginner's question (apologies, more will be coming). it's probably obvious. I am storing files in compressed csv form. I want to use the built-in julia readcsv() function. but I also need to pipe through a decompressor first. so, I tried a variety of forms, like d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |) d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`) I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but wrapping a readcsv around it does not capture it. how does one do this? Can you run the command with open() http://docs.julialang.org/en/latest/stdlib/base/?highlight=spawn#Base.open and pass the stream it returns to readcsv? Cheers Lex regards, /iaw
Re: [julia-users] Re: reading compressed csv file?
still not obviois. readcsv does have a dispatch for a stream (good), but I really need a popen function. x=readcsv(open(`gzcat myfile.csv.gz`, r)) is wrong. x=run(`gzcat myfiles.csv.gz`) doesn't send the output to x for further piping as far as I can see, so readcsv(x) doesn't do it. /iaw Ivo Welch (ivo.we...@gmail.com) http://www.ivo-welch.info/ Ivo Welch (ivo.we...@gmail.com) http://www.ivo-welch.info/ J. Fred Weston Distinguished Professor of Finance Anderson School at UCLA, C519 Director, UCLA Anderson Fink Center for Finance and Investments Free Finance Textbook, http://book.ivo-welch.info/ Exec Editor, Critical Finance Review, http://www.critical-finance-review.org/ Editor and Publisher, FAMe, http://www.fame-jagazine.com/ On Sun, Jan 4, 2015 at 4:55 PM, ele...@gmail.com wrote: On Monday, January 5, 2015 9:51:18 AM UTC+10, ivo welch wrote: dear julia users: beginner's question (apologies, more will be coming). it's probably obvious. I am storing files in compressed csv form. I want to use the built-in julia readcsv() function. but I also need to pipe through a decompressor first. so, I tried a variety of forms, like d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |) d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`) I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but wrapping a readcsv around it does not capture it. how does one do this? Can you run the command with open() http://docs.julialang.org/en/latest/stdlib/base/?highlight=spawn#Base.open and pass the stream it returns to readcsv? Cheers Lex regards, /iaw
Re: [julia-users] Re: reading compressed csv file?
I wonder if the GZip.jl package would help? --Tim On Sunday, January 04, 2015 05:11:50 PM ivo welch wrote: still not obviois. readcsv does have a dispatch for a stream (good), but I really need a popen function. x=readcsv(open(`gzcat myfile.csv.gz`, r)) is wrong. x=run(`gzcat myfiles.csv.gz`) doesn't send the output to x for further piping as far as I can see, so readcsv(x) doesn't do it. /iaw Ivo Welch (ivo.we...@gmail.com) http://www.ivo-welch.info/ Ivo Welch (ivo.we...@gmail.com) http://www.ivo-welch.info/ J. Fred Weston Distinguished Professor of Finance Anderson School at UCLA, C519 Director, UCLA Anderson Fink Center for Finance and Investments Free Finance Textbook, http://book.ivo-welch.info/ Exec Editor, Critical Finance Review, http://www.critical-finance-review.org/ Editor and Publisher, FAMe, http://www.fame-jagazine.com/ On Sun, Jan 4, 2015 at 4:55 PM, ele...@gmail.com wrote: On Monday, January 5, 2015 9:51:18 AM UTC+10, ivo welch wrote: dear julia users: beginner's question (apologies, more will be coming). it's probably obvious. I am storing files in compressed csv form. I want to use the built-in julia readcsv() function. but I also need to pipe through a decompressor first. so, I tried a variety of forms, like d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |) d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`) I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but wrapping a readcsv around it does not capture it. how does one do this? Can you run the command with open() http://docs.julialang.org/en/latest/stdlib/base/?highlight=spawn#Base.open and pass the stream it returns to readcsv? Cheers Lex regards, /iaw
Re: [julia-users] Re: reading compressed csv file?
On Monday, January 5, 2015 11:12:13 AM UTC+10, ivo welch wrote: still not obviois. readcsv does have a dispatch for a stream (good), but I really need a popen function. x=readcsv(open(`gzcat myfile.csv.gz`, r)) is wrong. x=run(`gzcat myfiles.csv.gz`) doesn't send the output to x for further piping as far as I can see, so readcsv(x) doesn't do it. The documentation I linked said: open(*command*, *mode::AbstractString=r*, *stdio=DevNull*) Start running command asynchronously, and return a tuple (stream,process) you need to pass the stream element of the tuple to readcsv() Cheers Lex /iaw Ivo Welch (ivo@gmail.com javascript:) http://www.ivo-welch.info/ Ivo Welch (ivo@gmail.com javascript:) http://www.ivo-welch.info/ J. Fred Weston Distinguished Professor of Finance Anderson School at UCLA, C519 Director, UCLA Anderson Fink Center for Finance and Investments Free Finance Textbook, http://book.ivo-welch.info/ Exec Editor, Critical Finance Review, http://www.critical-finance-review.org/ Editor and Publisher, FAMe, http://www.fame-jagazine.com/ On Sun, Jan 4, 2015 at 4:55 PM, ele...@gmail.com javascript: wrote: On Monday, January 5, 2015 9:51:18 AM UTC+10, ivo welch wrote: dear julia users: beginner's question (apologies, more will be coming). it's probably obvious. I am storing files in compressed csv form. I want to use the built-in julia readcsv() function. but I also need to pipe through a decompressor first. so, I tried a variety of forms, like d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |) d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`) I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but wrapping a readcsv around it does not capture it. how does one do this? Can you run the command with open() http://docs.julialang.org/en/latest/stdlib/base/?highlight=spawn#Base.open and pass the stream it returns to readcsv? Cheers Lex regards, /iaw
[julia-users] Re: reading compressed csv file?
An intuitive thought is, uncompress your csv file via bash utility *zcat*, pipe it to STDIN and use* readline(STDIN) *in julia. On Monday, January 5, 2015 7:51:18 AM UTC+8, ivo welch wrote: dear julia users: beginner's question (apologies, more will be coming). it's probably obvious. I am storing files in compressed csv form. I want to use the built-in julia readcsv() function. but I also need to pipe through a decompressor first. so, I tried a variety of forms, like d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |) d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`) I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but wrapping a readcsv around it does not capture it. how does one do this? regards, /iaw
Re: [julia-users] Re: reading compressed csv file?
This is how I used GZip.jl in the tests for the MatrixMarket package https://github.com/JuliaSparse/MatrixMarket.jl/blob/ba60e447f24938952509bb42c6d6bf9223562ef8/test/dl-matrixmarket.jl#L7 Perhaps it might be useful for you. Thanks, Jiahao Chen Staff Research Scientist MIT Computer Science and Artificial Intelligence Laboratory On Sun, Jan 4, 2015 at 9:29 PM, Todd Leo sliznmail...@gmail.com wrote: An intuitive thought is, uncompress your csv file via bash utility *zcat*, pipe it to STDIN and use* readline(STDIN) *in julia. On Monday, January 5, 2015 7:51:18 AM UTC+8, ivo welch wrote: dear julia users: beginner's question (apologies, more will be coming). it's probably obvious. I am storing files in compressed csv form. I want to use the built-in julia readcsv() function. but I also need to pipe through a decompressor first. so, I tried a variety of forms, like d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |) d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`) I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but wrapping a readcsv around it does not capture it. how does one do this? regards, /iaw
Re: [julia-users] Re: reading compressed csv file?
dear tim, lex, todd (others): thanks for responding. I really want to learn how to preprocess input from somewhere else into the readcsv() function. it's a good starting exercise for me to learn how to accomplish tasks in general. there is so much to learn. [I did not experiment with GZip.jl --- modules are new to me, and this one is not included. I could make too many errors in this process. It will probably make the specific task easier.] now, the first mistake which tripped me up for a while is that I did not grasp the difference between a string and a command. that is, I should not have used for my command. I had needed to use `. this is why open(echo hi) did not work, but open(`echo hi`) does. x=open(`gzcat myfile.csv.gz`) is a good start. I see it contains a tuple of a Pipe and a Process. this is printed by default on the command line. I learned I can make this work with d=readcsv( x[1] ) but I have a whole bunch of new questions, beyond question now. first, try this: julia x1=open(`gzcat d.csv.gz`) (Pipe(closed, 35 bytes waiting),Process(`gzcat d.csv.gz`, ProcessExited(0))) julia x2=open(`gzcat d.csv.gz`) (Pipe(active, 0 bytes waiting),Process(`gzcat d.csv.gz`, ProcessRunning)) how strange---the claims are different. even stranger, the first readcsv(x2[1]) is very slow now (I am talking 3 seconds on a 3 by 4 data file!); but following it with readcsv(x1[1]) is fast. I can't imagine readcsv has intelligence built-in to cache past specific conversions. another strange definition from a novice perspective: close(x1) is not defined. close(x1[1]) is. julia is the first language I have seen where a close(open(file)) is wrong. this is esp surprising because julia has the dispatch ability to understand what it could do with a close(Pipe,Process) tuple. the same holds true for other functions that expect a part of open. julia should be smart enough to know this. regards, /iaw Ivo Welch (ivo.we...@gmail.com) http://www.ivo-welch.info/ J. Fred Weston Distinguished Professor of Finance Anderson School at UCLA, C519 Director, UCLA Anderson Fink Center for Finance and Investments Free Finance Textbook, http://book.ivo-welch.info/ Exec Editor, Critical Finance Review, http://www.critical-finance-review.org/ Editor and Publisher, FAMe, http://www.fame-jagazine.com/ On Sun, Jan 4, 2015 at 6:29 PM, Todd Leo sliznmail...@gmail.com wrote: An intuitive thought is, uncompress your csv file via bash utility zcat, pipe it to STDIN and use readline(STDIN) in julia. On Monday, January 5, 2015 7:51:18 AM UTC+8, ivo welch wrote: dear julia users: beginner's question (apologies, more will be coming). it's probably obvious. I am storing files in compressed csv form. I want to use the built-in julia readcsv() function. but I also need to pipe through a decompressor first. so, I tried a variety of forms, like d= readcsv(/usr/bin/gzcat ./myfile.csv.gz |) d= readcsv(`/usr/bin/gzcat ./myfile.csv.gz`) I can type the file with run(`/usr/bin/gzcat ./crsp90.csv.gz), but wrapping a readcsv around it does not capture it. how does one do this? regards, /iaw