Re: [Haskell-cafe] Ideas on a fast and tidy CSV library

2013-08-21 Thread Johan Tibell
As I mentioned, you want to use the Streaming (or Incremental) module.
As the program now stands the call to `decode` causes 1.5 GB of CSV
data to be read as a `Vector (Vector Int)` before any encoding starts.

-- Johan


On Wed, Aug 21, 2013 at 1:09 PM, Justin Paston-Cooper
 wrote:
> Dear All,
>
> I now have some example code. I have put it on: http://pastebin.com/D9MPmyVd
> .
>
> vectorBinner is simply of type Vector Int -> Int. I am inputting a 1.5GB CSV
> on stdin, and would like vectorBinner to run over every single record,
> outputting results as computed, thus running in constant memory. My
> programme instead quickly approaches full memory use. Is there any way to
> work around this?
>
> Justin
>
>
> On 25 July 2013 17:53, Johan Tibell  wrote:
>>
>> You can use the Incremental or Streaming modules to get more fine
>> grained control over when new parsed records are produced.
>>
>> On Thu, Jul 25, 2013 at 11:02 AM, Justin Paston-Cooper
>>  wrote:
>> > I hadn't yet tried profiling the programme. I actually deleted it a few
>> > days
>> > ago. I'm going to try to get something new running, and I will report
>> > back.
>> > On a slightly less related track: Is there any way to use cassava so
>> > that I
>> > can have pure state and also yield CSV lines while my computation is
>> > running
>> > instead of everything at the end as would be with the State monad?
>> >
>> >
>> > On 23 July 2013 22:13, Johan Tibell  wrote:
>> >>
>> >> On Tue, Jul 23, 2013 at 5:45 PM, Ben Gamari 
>> >> wrote:
>> >> > Justin Paston-Cooper  writes:
>> >> >
>> >> >> Dear All,
>> >> >>
>> >> >> Recently I have been doing a lot of CSV processing. I initially
>> >> >> tried
>> >> >> to
>> >> >> use the Data.Csv (cassava) library provided on Hackage, but I found
>> >> >> this to
>> >> >> still be too slow for my needs. In the meantime I have reverted to
>> >> >> hacking
>> >> >> something together in C, but I have been left wondering whether a
>> >> >> tidy
>> >> >> solution might be possible to implement in Haskell.
>> >> >>
>> >> > Have you tried profiling your cassava implementation? In my
>> >> > experience
>> >> > I've found it's quite quick. If you have an example of a slow path
>> >> > I'm
>> >> > sure Johan (cc'd) would like to know about it.
>> >>
>> >> I'm always interested in examples of code that is not running fast
>> >> enough. Send me a reproducible example (preferably as a bug on the
>> >> GitHub bug tracker) and I'll take a look.
>> >
>> >
>
>

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Ideas on a fast and tidy CSV library

2013-08-21 Thread Justin Paston-Cooper
Dear All,

I now have some example code. I have put it on: http://pastebin.com/D9MPmyVd.

vectorBinner is simply of type Vector Int -> Int. I am inputting a 1.5GB
CSV on stdin, and would like vectorBinner to run over every single record,
outputting results as computed, thus running in constant memory. My
programme instead quickly approaches full memory use. Is there any way to
work around this?

Justin


On 25 July 2013 17:53, Johan Tibell  wrote:

> You can use the Incremental or Streaming modules to get more fine
> grained control over when new parsed records are produced.
>
> On Thu, Jul 25, 2013 at 11:02 AM, Justin Paston-Cooper
>  wrote:
> > I hadn't yet tried profiling the programme. I actually deleted it a few
> days
> > ago. I'm going to try to get something new running, and I will report
> back.
> > On a slightly less related track: Is there any way to use cassava so
> that I
> > can have pure state and also yield CSV lines while my computation is
> running
> > instead of everything at the end as would be with the State monad?
> >
> >
> > On 23 July 2013 22:13, Johan Tibell  wrote:
> >>
> >> On Tue, Jul 23, 2013 at 5:45 PM, Ben Gamari 
> >> wrote:
> >> > Justin Paston-Cooper  writes:
> >> >
> >> >> Dear All,
> >> >>
> >> >> Recently I have been doing a lot of CSV processing. I initially tried
> >> >> to
> >> >> use the Data.Csv (cassava) library provided on Hackage, but I found
> >> >> this to
> >> >> still be too slow for my needs. In the meantime I have reverted to
> >> >> hacking
> >> >> something together in C, but I have been left wondering whether a
> tidy
> >> >> solution might be possible to implement in Haskell.
> >> >>
> >> > Have you tried profiling your cassava implementation? In my experience
> >> > I've found it's quite quick. If you have an example of a slow path I'm
> >> > sure Johan (cc'd) would like to know about it.
> >>
> >> I'm always interested in examples of code that is not running fast
> >> enough. Send me a reproducible example (preferably as a bug on the
> >> GitHub bug tracker) and I'll take a look.
> >
> >
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Ideas on a fast and tidy CSV library

2013-07-25 Thread Johan Tibell
You can use the Incremental or Streaming modules to get more fine
grained control over when new parsed records are produced.

On Thu, Jul 25, 2013 at 11:02 AM, Justin Paston-Cooper
 wrote:
> I hadn't yet tried profiling the programme. I actually deleted it a few days
> ago. I'm going to try to get something new running, and I will report back.
> On a slightly less related track: Is there any way to use cassava so that I
> can have pure state and also yield CSV lines while my computation is running
> instead of everything at the end as would be with the State monad?
>
>
> On 23 July 2013 22:13, Johan Tibell  wrote:
>>
>> On Tue, Jul 23, 2013 at 5:45 PM, Ben Gamari 
>> wrote:
>> > Justin Paston-Cooper  writes:
>> >
>> >> Dear All,
>> >>
>> >> Recently I have been doing a lot of CSV processing. I initially tried
>> >> to
>> >> use the Data.Csv (cassava) library provided on Hackage, but I found
>> >> this to
>> >> still be too slow for my needs. In the meantime I have reverted to
>> >> hacking
>> >> something together in C, but I have been left wondering whether a tidy
>> >> solution might be possible to implement in Haskell.
>> >>
>> > Have you tried profiling your cassava implementation? In my experience
>> > I've found it's quite quick. If you have an example of a slow path I'm
>> > sure Johan (cc'd) would like to know about it.
>>
>> I'm always interested in examples of code that is not running fast
>> enough. Send me a reproducible example (preferably as a bug on the
>> GitHub bug tracker) and I'll take a look.
>
>

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Ideas on a fast and tidy CSV library

2013-07-25 Thread Justin Paston-Cooper
I hadn't yet tried profiling the programme. I actually deleted it a few
days ago. I'm going to try to get something new running, and I will report
back. On a slightly less related track: Is there any way to use cassava so
that I can have pure state and also yield CSV lines while my computation is
running instead of everything at the end as would be with the State monad?


On 23 July 2013 22:13, Johan Tibell  wrote:

> On Tue, Jul 23, 2013 at 5:45 PM, Ben Gamari 
> wrote:
> > Justin Paston-Cooper  writes:
> >
> >> Dear All,
> >>
> >> Recently I have been doing a lot of CSV processing. I initially tried to
> >> use the Data.Csv (cassava) library provided on Hackage, but I found
> this to
> >> still be too slow for my needs. In the meantime I have reverted to
> hacking
> >> something together in C, but I have been left wondering whether a tidy
> >> solution might be possible to implement in Haskell.
> >>
> > Have you tried profiling your cassava implementation? In my experience
> > I've found it's quite quick. If you have an example of a slow path I'm
> > sure Johan (cc'd) would like to know about it.
>
> I'm always interested in examples of code that is not running fast
> enough. Send me a reproducible example (preferably as a bug on the
> GitHub bug tracker) and I'll take a look.
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Ideas on a fast and tidy CSV library

2013-07-23 Thread Johan Tibell
On Tue, Jul 23, 2013 at 5:45 PM, Ben Gamari  wrote:
> Justin Paston-Cooper  writes:
>
>> Dear All,
>>
>> Recently I have been doing a lot of CSV processing. I initially tried to
>> use the Data.Csv (cassava) library provided on Hackage, but I found this to
>> still be too slow for my needs. In the meantime I have reverted to hacking
>> something together in C, but I have been left wondering whether a tidy
>> solution might be possible to implement in Haskell.
>>
> Have you tried profiling your cassava implementation? In my experience
> I've found it's quite quick. If you have an example of a slow path I'm
> sure Johan (cc'd) would like to know about it.

I'm always interested in examples of code that is not running fast
enough. Send me a reproducible example (preferably as a bug on the
GitHub bug tracker) and I'll take a look.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Ideas on a fast and tidy CSV library

2013-07-23 Thread Ben Gamari
Justin Paston-Cooper  writes:

> Dear All,
>
> Recently I have been doing a lot of CSV processing. I initially tried to
> use the Data.Csv (cassava) library provided on Hackage, but I found this to
> still be too slow for my needs. In the meantime I have reverted to hacking
> something together in C, but I have been left wondering whether a tidy
> solution might be possible to implement in Haskell.
>
Have you tried profiling your cassava implementation? In my experience
I've found it's quite quick. If you have an example of a slow path I'm
sure Johan (cc'd) would like to know about it.

> I would like to build a library that satisfies the following:
>
> 1) Run a function < ... -> a_n -> m (Maybe (b_1, ..., b_n))>>,
> with <> some monad and the <>s and <>s being input and output.
>
> 2) Be able to specify a maximum record string length and output record
> string length, so that the string buffers used for reading and outputting
> lines can be reused, preventing the need for allocating new strings for
> each record.
>
> 3) Allocate only once, the memory where the parsed input values, and output
> values are put.
>
Ultimately this could be rather tricky to enforce. Haskell code
generally does a lot of allocation and the RTS is well optimized to
handle this.

I've often found that trying to shoehorn a non-idiomatic "optimal"
imperative approach into Haskell produces worse performance than the
more readable, idiomatic approach.

I understand this leaves many of your questions unanswered, but I'd give
the idiomatic approach a bit more time before trying to coerce C into
Haskell. Profile, see where the hotspots are and optimize
appropriately. If the profile has you flummoxed, the lists and #haskell
are always willing to help given the time.

Cheers,

- Ben



pgpp3Fd9RzpaG.pgp
Description: PGP signature
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe