Re: Working with big datasets, merging two ordered lists by key

Frank Behrens Fri, 14 Mar 2014 09:44:04 -0700

I am still working on the solution, (see gist<https://gist.github.com/9551489>
) and want to share my current thoughts.


The problem is to process over a join on two big datasets (from different 
sources). 
Right now I a quite confident as I break the problem into smaller parts, 
and I am starting to see, how this is very easy in clojure.
1) I have to bring both datasets (lists) into a nice form: [ id 
{attributes}] might be a good fit
2) because they are sorted and the id is unique (right now) , with my 
merge-sorted 
function, i can pull the records from the list, compare them with a 
function (defaults to identity in the upper case) and pair them up.
3) from the resulting list of pair i can filter the records, which I am 
interested in, and 
4) do my processing over them.

This approach seems simple, and flexible to me, would be very useful for 
different problems we have at our big enterprise.

I am close to putting the parts together, and will then see, how this fits 
in memory,
and if it solves my current problem.

But im my newbie clojure dreams, i could imagine to get this done in a lazy 
fashion.

Can my clojureCLR databasequery, sorted textfile, merging, filtering and 
processing all be lazy????


>>>>  

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to [email protected]
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Working with big datasets, merging two ordered lists by key

Reply via email to