Re: Working with big datasets, merging two ordered lists by key
I am still working on the solution, (see gisthttps://gist.github.com/9551489 ) and want to share my current thoughts. The problem is to process over a join on two big datasets (from different sources). Right now I a quite confident as I break the problem into smaller parts, and I am starting to see, how this is very easy in clojure. 1) I have to bring both datasets (lists) into a nice form: [ id {attributes}] might be a good fit 2) because they are sorted and the id is unique (right now) , with my merge-sorted function, i can pull the records from the list, compare them with a function (defaults to identity in the upper case) and pair them up. 3) from the resulting list of pair i can filter the records, which I am interested in, and 4) do my processing over them. This approach seems simple, and flexible to me, would be very useful for different problems we have at our big enterprise. I am close to putting the parts together, and will then see, how this fits in memory, and if it solves my current problem. But im my newbie clojure dreams, i could imagine to get this done in a lazy fashion. Can my clojureCLR databasequery, sorted textfile, merging, filtering and processing all be lazy -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Working with big datasets, merging two ordered lists by key
Thanks for your suggestions. a for loop has to do 100.000 * 300.000 compares Storing the database table into a 300.000 element hash, would be a memory penalty I want to avoid. I'm quite shure that assential part of the solution is a function to iterate through both list at once, spitting out pairs of values according to compare (merge-sortedlists '(1 2 3) '( 24)) = ([1 nil] [2 2] [3 nil] [nil 4]) Seems quite doable. Try to implement now. Frank Am Montag, 10. März 2014 01:23:57 UTC+1 schrieb frye: Hmm, the *for* comprehension yields a lazy sequence of results. So the penalty should only occur when one starts to use / evaluate the result. Using maps is a good idea. But I think you'll have to use another algorithm (not *for*) to get the random access you seek. Frank could try a *clojure.set/intersection* to find common keys between the lists. then *order* and *map* / *merge* the 2 lists. Beyond that, I can't see a scenario where some iteration won't have to search the space for matching keys (which I think *clojure.set/intersection* does). A fair point all the same. Tim Washington Interruptsoftware.com http://interruptsoftware.com On Sun, Mar 9, 2014 at 12:13 PM, Moritz Ulrich mor...@tarn-vedra.dejavascript: wrote: I think it would be more efficient to read one of the inputs into a map for random access instead of iterating it every time. On Sun, Mar 9, 2014 at 4:48 PM, Timothy Washington twas...@gmail.comjavascript: wrote: Hey Frank, Try opening up a repl, and running this for comprehension. (def user_textfile [[:id1 {:name 'Frank'}] [:id3 {:name 'Tim'}]]) (def user_database [[:id1 {:age 38}] [:id2 {:age 27}] [:id3 {:age 18}] [:id4 {:age 60}]]) (for [i user_textfile j user_database :when (= (first i) (first j))] {(first i) (merge (second i) (second j))}) ({:id1 {:age 38, :name Frank'}} {:id3 {:age 18, :name Tim'}}) ;; result from repl Hth Tim Washington Interruptsoftware.com On Sun, Mar 9, 2014 at 5:33 AM, Frank Behrens fbeh...@gmail.comjavascript: wrote: Hi, i'm investigating if clojure can be used to solve the challenges and problems we have at my day job better than ruby or powershell. A very common use case is validating data from different systems against some criteria. i believe clojure can be our silver bullet, but before that, it seems to be required to wrap my head around it. So I am starting in the first level with the challenge to validate some data from the user database against our active directory. I already have all the parts to make it work: Which is to make a hash by user_id from the database table, export a textfile from AD, each line representing a user, parse it, merge the information from the user_table_hash, and voila. I did not finish to implement this. So I don't know if this naive approach will work with 400.000 records in the user database and 100.000 in the textfile. But I already think about how I could implement this in a more memory efficient way. So my simple question: I have user_textfile (100.000 records) which can be parsed into a unordered list of user-maps. I have user_table in the database(400.000 record) which I can query with order and gives me an ordered list of user-maps. So I would first order the user_textfile and then conj the user_table ordered list into it, while doing the database query. Is that approach right ? How would I then merge the two ordered lists like in the example below? (defn user_textfile ([:id1 {:name 'Frank'}] [:id3 {:name 'Tim'}])) (defn user_database ([:id1 {:age 38}] [:id2 {:age 27}] [:id3 {:age 18}] [:id4 {:age 60}])) (merge-sorted-lists user_database user_textfile) = ([:id1 {:name 'Frank' :age 38}] [:id3 {:name 'Tim' :age 18}])) Any feedback is appreciated. Have a nice day, Frank -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clo...@googlegroups.comjavascript: Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+u...@googlegroups.com javascript: For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com javascript:. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clo...@googlegroups.comjavascript: Note that posts from new members are moderated - please be patient with your
Re: Working with big datasets, merging two ordered lists by key
Hey, just to share, I came up with this code, which seem quite ok to me, Feels like I already understand something, do i, Have a nice day, Frank (loop [a '(1 2 3 4) b '(1 3) out ()] (cond (and (empty? a)(empty? b)) out (empty? a) (recur a (rest b) (conj out [nil (first b)])) (empty? b) (recur (rest a) b (conj out [(first a) nil])) :else (let [fa (first a) fb (first b) cmp (compare fa fb)] (cond (= 0 cmp) (recur (rest a) (rest b) (conj out [fa fb])) ( 0 cmp) (recur (rest a) b (conj out [fa nil])) :else (recur a (rest b) (conj out [nil fb])) Am Montag, 10. März 2014 09:26:14 UTC+1 schrieb Frank Behrens: Thanks for your suggestions. a for loop has to do 100.000 * 300.000 compares Storing the database table into a 300.000 element hash, would be a memory penalty I want to avoid. I'm quite shure that assential part of the solution is a function to iterate through both list at once, spitting out pairs of values according to compare (merge-sortedlists '(1 2 3) '( 24)) = ([1 nil] [2 2] [3 nil] [nil 4]) Seems quite doable. Try to implement now. Frank Am Montag, 10. März 2014 01:23:57 UTC+1 schrieb frye: Hmm, the *for* comprehension yields a lazy sequence of results. So the penalty should only occur when one starts to use / evaluate the result. Using maps is a good idea. But I think you'll have to use another algorithm (not *for*) to get the random access you seek. Frank could try a *clojure.set/intersection* to find common keys between the lists. then *order* and *map* / *merge* the 2 lists. Beyond that, I can't see a scenario where some iteration won't have to search the space for matching keys (which I think *clojure.set/intersection* does). A fair point all the same. Tim Washington Interruptsoftware.com http://interruptsoftware.com On Sun, Mar 9, 2014 at 12:13 PM, Moritz Ulrich mor...@tarn-vedra.dewrote: I think it would be more efficient to read one of the inputs into a map for random access instead of iterating it every time. On Sun, Mar 9, 2014 at 4:48 PM, Timothy Washington twas...@gmail.com wrote: Hey Frank, Try opening up a repl, and running this for comprehension. (def user_textfile [[:id1 {:name 'Frank'}] [:id3 {:name 'Tim'}]]) (def user_database [[:id1 {:age 38}] [:id2 {:age 27}] [:id3 {:age 18}] [:id4 {:age 60}]]) (for [i user_textfile j user_database :when (= (first i) (first j))] {(first i) (merge (second i) (second j))}) ({:id1 {:age 38, :name Frank'}} {:id3 {:age 18, :name Tim'}}) ;; result from repl Hth Tim Washington Interruptsoftware.com On Sun, Mar 9, 2014 at 5:33 AM, Frank Behrens fbeh...@gmail.com wrote: Hi, i'm investigating if clojure can be used to solve the challenges and problems we have at my day job better than ruby or powershell. A very common use case is validating data from different systems against some criteria. i believe clojure can be our silver bullet, but before that, it seems to be required to wrap my head around it. So I am starting in the first level with the challenge to validate some data from the user database against our active directory. I already have all the parts to make it work: Which is to make a hash by user_id from the database table, export a textfile from AD, each line representing a user, parse it, merge the information from the user_table_hash, and voila. I did not finish to implement this. So I don't know if this naive approach will work with 400.000 records in the user database and 100.000 in the textfile. But I already think about how I could implement this in a more memory efficient way. So my simple question: I have user_textfile (100.000 records) which can be parsed into a unordered list of user-maps. I have user_table in the database(400.000 record) which I can query with order and gives me an ordered list of user-maps. So I would first order the user_textfile and then conj the user_table ordered list into it, while doing the database query. Is that approach right ? How would I then merge the two ordered lists like in the example below? (defn user_textfile ([:id1 {:name 'Frank'}] [:id3 {:name 'Tim'}])) (defn user_database ([:id1 {:age 38}] [:id2 {:age 27}] [:id3 {:age 18}] [:id4 {:age 60}])) (merge-sorted-lists user_database user_textfile) = ([:id1 {:name 'Frank' :age 38}] [:id3 {:name 'Tim' :age 18}])) Any feedback is appreciated. Have a nice day, Frank -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clo...@googlegroups.com Note that posts from new members are moderated -
Re: Working with big datasets, merging two ordered lists by key
Hey Frank, Right. So I tried this loop / recur, and it runs, giving a result of *([4 nil] [3 3] [2 nil] [1 1])*. But I'm not sure how that's going to help you (although not discounting the possibility). You can simultaneously iterate through pairs of lists, to compare values. However you cannot guarantee that those lists will be *i)* ordered, and *ii)* the same length. Both those conditions are required for your algorithm to work. Plus, what you suggest still means that you'll have to scan through the entire space of both results. So we're not going to avoid that. Based on your requirements, I still see my original *for* comprehension as the most straightforward way to solve the problem. My second suggested algorithm could also work. But I could be wrong and am always learning too. So trying different solutions is a good habit to keep. Hth Tim Washington Interruptsoftware.com http://interruptsoftware.com On Mon, Mar 10, 2014 at 4:53 AM, Frank Behrens fbehr...@gmail.com wrote: Hey, just to share, I came up with this code, which seem quite ok to me, Feels like I already understand something, do i, Have a nice day, Frank (loop [a '(1 2 3 4) b '(1 3) out ()] (cond (and (empty? a)(empty? b)) out (empty? a) (recur a (rest b) (conj out [nil (first b)])) (empty? b) (recur (rest a) b (conj out [(first a) nil])) :else (let [fa (first a) fb (first b) cmp (compare fa fb)] (cond (= 0 cmp) (recur (rest a) (rest b) (conj out [fa fb])) ( 0 cmp) (recur (rest a) b (conj out [fa nil])) :else (recur a (rest b) (conj out [nil fb])) Am Montag, 10. März 2014 09:26:14 UTC+1 schrieb Frank Behrens: Thanks for your suggestions. a for loop has to do 100.000 * 300.000 compares Storing the database table into a 300.000 element hash, would be a memory penalty I want to avoid. I'm quite shure that assential part of the solution is a function to iterate through both list at once, spitting out pairs of values according to compare (merge-sortedlists '(1 2 3) '( 24)) = ([1 nil] [2 2] [3 nil] [nil 4]) Seems quite doable. Try to implement now. Frank -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Working with big datasets, merging two ordered lists by key
Re. Tim's points below: *i)* The seqs have to be ordered, or one of them has to be loaded fully into memory; I don't think there's any way around that. *ii)* Frank's solution does *not* require the seqs to be the same length, and it gives you the complete 'diff' of the seqs (aka outer join), which could be handy. The one snag I see is that it is eager, not lazy, so it's going to put the answer completely in memory. So unless you are projecting out a small subset of the fields from each record, you will probably end up using as much memory as the other solutions. I wrote a lazy version using 'iterate', but I'm not sure it doesn't keep both entire seqs in memory, too. My two cents: 1. If you have enough memory, go with Moritz' suggestion to read the smaller seq into a map. Then you can do a simple for comprehension and arrange it so that the second, larger seq will never be completely in memory. 2. Another possible solution is to load the textfile into a temp table in your database. Then the solution is one simple SQL query, backed by hyper-optimized code designed to deal with this exact problem. 3. You may want to try the naive approach: 400k records sounds like it could very well fit into memory, as long as each record doesn't have a huge amount of data. 4. A library that has tools to deal with big files: https://github.com/kyleburton/clj-etl-utils --Leif On Monday, March 10, 2014 11:01:07 PM UTC-4, frye wrote: Hey Frank, Right. So I tried this loop / recur, and it runs, giving a result of *([4 nil] [3 3] [2 nil] [1 1])*. But I'm not sure how that's going to help you (although not discounting the possibility). You can simultaneously iterate through pairs of lists, to compare values. However you cannot guarantee that those lists will be *i)* ordered, and *ii)* the same length. Both those conditions are required for your algorithm to work. Plus, what you suggest still means that you'll have to scan through the entire space of both results. So we're not going to avoid that. Based on your requirements, I still see my original *for* comprehension as the most straightforward way to solve the problem. My second suggested algorithm could also work. But I could be wrong and am always learning too. So trying different solutions is a good habit to keep. Hth Tim Washington Interruptsoftware.com http://interruptsoftware.com On Mon, Mar 10, 2014 at 4:53 AM, Frank Behrens fbeh...@gmail.comjavascript: wrote: Hey, just to share, I came up with this code, which seem quite ok to me, Feels like I already understand something, do i, Have a nice day, Frank (loop [a '(1 2 3 4) b '(1 3) out ()] (cond (and (empty? a)(empty? b)) out (empty? a) (recur a (rest b) (conj out [nil (first b)])) (empty? b) (recur (rest a) b (conj out [(first a) nil])) :else (let [fa (first a) fb (first b) cmp (compare fa fb)] (cond (= 0 cmp) (recur (rest a) (rest b) (conj out [fa fb])) ( 0 cmp) (recur (rest a) b (conj out [fa nil])) :else (recur a (rest b) (conj out [nil fb])) Am Montag, 10. März 2014 09:26:14 UTC+1 schrieb Frank Behrens: Thanks for your suggestions. a for loop has to do 100.000 * 300.000 compares Storing the database table into a 300.000 element hash, would be a memory penalty I want to avoid. I'm quite shure that assential part of the solution is a function to iterate through both list at once, spitting out pairs of values according to compare (merge-sortedlists '(1 2 3) '( 24)) = ([1 nil] [2 2] [3 nil] [nil 4]) Seems quite doable. Try to implement now. Frank -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Working with big datasets, merging two ordered lists by key
Hey Frank, Try opening up a repl, and running this *for* comprehension. (def user_textfile [[:id1 {:name 'Frank'}] [:id3 {:name 'Tim'}]]) (def user_database [[:id1 {:age 38}] [:id2 {:age 27}] [:id3 {:age 18}] [:id4 {:age 60}]]) (for [i user_textfile j user_database :when (= (first i) (first j))] {(first i) (merge (second i) (second j))}) *({:id1 {:age 38, :name Frank'}} {:id3 {:age 18, :name Tim'}}) ;; result from repl * Hth Tim Washington Interruptsoftware.com http://interruptsoftware.com On Sun, Mar 9, 2014 at 5:33 AM, Frank Behrens fbehr...@gmail.com wrote: Hi, i'm investigating if clojure can be used to solve the challenges and problems we have at my day job better than ruby or powershell. A very common use case is validating data from different systems against some criteria. i believe clojure can be our silver bullet, but before that, it seems to be required to wrap my head around it. So I am starting in the first level with the challenge to validate some data from the user database against our active directory. I already have all the parts to make it work: Which is to make a hash by user_id from the database table, export a textfile from AD, each line representing a user, parse it, merge the information from the user_table_hash, and voila. I did not finish to implement this. So I don't know if this naive approach will work with 400.000 records in the user database and 100.000 in the textfile. But I already think about how I could implement this in a more memory efficient way. So my simple question: I have user_textfile (100.000 records) which can be parsed into a unordered list of user-maps. I have user_table in the database(400.000 record) which I can query with order and gives me an ordered list of user-maps. So I would first order the user_textfile and then conj the user_table ordered list into it, while doing the database query. Is that approach right ? How would I then merge the two ordered lists like in the example below? (defn user_textfile ([:id1 {:name 'Frank'}] [:id3 {:name 'Tim'}])) (defn user_database ([:id1 {:age 38}] [:id2 {:age 27}] [:id3 {:age 18}] [:id4 {:age 60}])) (merge-sorted-lists user_database user_textfile) = ([:id1 {:name 'Frank' :age 38}] [:id3 {:name 'Tim' :age 18}])) Any feedback is appreciated. Have a nice day, Frank -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Working with big datasets, merging two ordered lists by key
I think it would be more efficient to read one of the inputs into a map for random access instead of iterating it every time. On Sun, Mar 9, 2014 at 4:48 PM, Timothy Washington twash...@gmail.com wrote: Hey Frank, Try opening up a repl, and running this for comprehension. (def user_textfile [[:id1 {:name 'Frank'}] [:id3 {:name 'Tim'}]]) (def user_database [[:id1 {:age 38}] [:id2 {:age 27}] [:id3 {:age 18}] [:id4 {:age 60}]]) (for [i user_textfile j user_database :when (= (first i) (first j))] {(first i) (merge (second i) (second j))}) ({:id1 {:age 38, :name Frank'}} {:id3 {:age 18, :name Tim'}}) ;; result from repl Hth Tim Washington Interruptsoftware.com On Sun, Mar 9, 2014 at 5:33 AM, Frank Behrens fbehr...@gmail.com wrote: Hi, i'm investigating if clojure can be used to solve the challenges and problems we have at my day job better than ruby or powershell. A very common use case is validating data from different systems against some criteria. i believe clojure can be our silver bullet, but before that, it seems to be required to wrap my head around it. So I am starting in the first level with the challenge to validate some data from the user database against our active directory. I already have all the parts to make it work: Which is to make a hash by user_id from the database table, export a textfile from AD, each line representing a user, parse it, merge the information from the user_table_hash, and voila. I did not finish to implement this. So I don't know if this naive approach will work with 400.000 records in the user database and 100.000 in the textfile. But I already think about how I could implement this in a more memory efficient way. So my simple question: I have user_textfile (100.000 records) which can be parsed into a unordered list of user-maps. I have user_table in the database(400.000 record) which I can query with order and gives me an ordered list of user-maps. So I would first order the user_textfile and then conj the user_table ordered list into it, while doing the database query. Is that approach right ? How would I then merge the two ordered lists like in the example below? (defn user_textfile ([:id1 {:name 'Frank'}] [:id3 {:name 'Tim'}])) (defn user_database ([:id1 {:age 38}] [:id2 {:age 27}] [:id3 {:age 18}] [:id4 {:age 60}])) (merge-sorted-lists user_database user_textfile) = ([:id1 {:name 'Frank' :age 38}] [:id3 {:name 'Tim' :age 18}])) Any feedback is appreciated. Have a nice day, Frank -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: Working with big datasets, merging two ordered lists by key
Hmm, the *for* comprehension yields a lazy sequence of results. So the penalty should only occur when one starts to use / evaluate the result. Using maps is a good idea. But I think you'll have to use another algorithm (not *for*) to get the random access you seek. Frank could try a *clojure.set/intersection* to find common keys between the lists. then *order* and *map* / *merge* the 2 lists. Beyond that, I can't see a scenario where some iteration won't have to search the space for matching keys (which I think *clojure.set/intersection* does). A fair point all the same. Tim Washington Interruptsoftware.com http://interruptsoftware.com On Sun, Mar 9, 2014 at 12:13 PM, Moritz Ulrich mor...@tarn-vedra.de wrote: I think it would be more efficient to read one of the inputs into a map for random access instead of iterating it every time. On Sun, Mar 9, 2014 at 4:48 PM, Timothy Washington twash...@gmail.com wrote: Hey Frank, Try opening up a repl, and running this for comprehension. (def user_textfile [[:id1 {:name 'Frank'}] [:id3 {:name 'Tim'}]]) (def user_database [[:id1 {:age 38}] [:id2 {:age 27}] [:id3 {:age 18}] [:id4 {:age 60}]]) (for [i user_textfile j user_database :when (= (first i) (first j))] {(first i) (merge (second i) (second j))}) ({:id1 {:age 38, :name Frank'}} {:id3 {:age 18, :name Tim'}}) ;; result from repl Hth Tim Washington Interruptsoftware.com On Sun, Mar 9, 2014 at 5:33 AM, Frank Behrens fbehr...@gmail.com wrote: Hi, i'm investigating if clojure can be used to solve the challenges and problems we have at my day job better than ruby or powershell. A very common use case is validating data from different systems against some criteria. i believe clojure can be our silver bullet, but before that, it seems to be required to wrap my head around it. So I am starting in the first level with the challenge to validate some data from the user database against our active directory. I already have all the parts to make it work: Which is to make a hash by user_id from the database table, export a textfile from AD, each line representing a user, parse it, merge the information from the user_table_hash, and voila. I did not finish to implement this. So I don't know if this naive approach will work with 400.000 records in the user database and 100.000 in the textfile. But I already think about how I could implement this in a more memory efficient way. So my simple question: I have user_textfile (100.000 records) which can be parsed into a unordered list of user-maps. I have user_table in the database(400.000 record) which I can query with order and gives me an ordered list of user-maps. So I would first order the user_textfile and then conj the user_table ordered list into it, while doing the database query. Is that approach right ? How would I then merge the two ordered lists like in the example below? (defn user_textfile ([:id1 {:name 'Frank'}] [:id3 {:name 'Tim'}])) (defn user_database ([:id1 {:age 38}] [:id2 {:age 27}] [:id3 {:age 18}] [:id4 {:age 60}])) (merge-sorted-lists user_database user_textfile) = ([:id1 {:name 'Frank' :age 38}] [:id3 {:name 'Tim' :age 18}])) Any feedback is appreciated. Have a nice day, Frank -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com