Re: Working with big datasets, merging two ordered lists by key

2014-03-14 Thread Frank Behrens
I am still working on the solution, (see gist
) and want to share my current thoughts.

The problem is to process over a join on two big datasets (from different 
sources). 
Right now I a quite confident as I break the problem into smaller parts, 
and I am starting to see, how this is very easy in clojure.
1) I have to bring both datasets (lists) into a nice form: [ id 
{attributes}] might be a good fit
2) because they are sorted and the id is unique (right now) , with my 
merge-sorted 
function, i can pull the records from the list, compare them with a 
function (defaults to identity in the upper case) and pair them up.
3) from the resulting list of pair i can filter the records, which I am 
interested in, and 
4) do my processing over them.

This approach seems simple, and flexible to me, would be very useful for 
different problems we have at our big enterprise.

I am close to putting the parts together, and will then see, how this fits 
in memory,
and if it solves my current problem.

But im my newbie clojure dreams, i could imagine to get this done in a lazy 
fashion.

Can my clojureCLR databasequery, sorted textfile, merging, filtering and 
processing all be lazy


  

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Get a LAZY list of records from a database query ?

2014-03-14 Thread Frank Behrens
Hi, I wonder if its possible to convert this database query (its CLR) into 
a lazy sequence.

The reader loop is wrapped in the opening and closing of the db-connection 
and reader.

When i 'take a few records from the sequence, will then the connection be 
closed 
because it's getting out of scope and will be garbage collected ?

How is it possible ?

(defn user [conn-str]
  (let [conn (System.Data.SqlClient.SqlConnection. conn-str)
_ (.Open conn)
cmd (System.Data.SqlClient.SqlCommand.
  (str "SELECT name from User") conn)
reader (.ExecuteReader cmd)]
(let [out (loop [out '()]
(if (.Read reader)
  (do 
(print ".")
((recur (conj out (.GetString reader 0)
  out))]
(.Close reader)
(.Close conn)
out)))

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need help understanding a lazy sequence function

2014-03-10 Thread Frank Behrens
Hello,

I'm trying to understand the lazyness, how they work, how to create them, 
how to avoid pre-realisation.

Can someone point me to which documentation would be helpful, where do I 
find it ?

Frank

Am Montag, 10. März 2014 13:16:00 UTC+1 schrieb Asfand Yar Qazi:
>
> On Monday, 10 March 2014 11:35:30 UTC, Alan Forrester wrote:
>>
>> According to the documentation for map 
>> http://clojuredocs.org/clojure_core/clojure.core/map 
>> (map + x y) 
>>
>> where x and y are two collections adds the first element of x to the 
>> first element of y, the second element of x to the second element of y 
>> and so on until either x or y is exhausted. 
>>
>
> OK I feel like an idiot - I was going by what I picked up from Clojure 
> Programming, and didn't read the official API docs, sorry.  The "until 
> either x or y is exhausted" bit is what was the missing piece of the puzzle.
>  
>
>> You seem to be trying to imagine how lazy-seqs work rather than 
>> reading the documentation, which tells you how they behave when you 
>> run a program or type an expression into the REPL.
>>
>
> I will take your advice on-board.
>
> Many thanks
>  
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Working with big datasets, merging two ordered lists by key

2014-03-10 Thread Frank Behrens
Hey, just to share, I came up with this code, which seem quite ok to me,
Feels like I already understand something, do i,
Have a nice day, Frank

(loop
  [a '(1 2 3 4)
   b '(1 3)
   out ()]
  (cond 
(and (empty? a)(empty? b)) out
(empty? a) (recur a (rest b) (conj out [nil (first 
b)]))   
(empty? b) (recur (rest a)  b (conj out [(first a) 
nil]))
:else (let
[fa   (first a)
 fb   (first b)
 cmp  (compare fa fb)]
(cond 
(= 0 cmp) (recur (rest a) (rest b) (conj out [fa fb]))
(> 0 cmp) (recur (rest a)  b   (conj out [fa nil]))
:else (recur  a   (rest b) (conj out [nil fb]))


Am Montag, 10. März 2014 09:26:14 UTC+1 schrieb Frank Behrens:
>
> Thanks for your suggestions. 
> a for loop has to do  100.000 * 300.000 compares
> Storing the database table into a 300.000 element hash, would be a memory 
> penalty I want to avoid.
>
> I'm quite shure that assential part of the solution is a function to 
> iterate through both list at once,
> spitting out pairs of values according to compare
>
> (merge-sortedlists 
>   '(1 2 3)
>   '(   24))
> => ([1 nil] [2 2] [3 nil] [nil 4])
>
> Seems quite doable.
> Try to implement now.
>
> Frank
>
>
> Am Montag, 10. März 2014 01:23:57 UTC+1 schrieb frye:
>>
>> Hmm, the *for* comprehension yields a lazy sequence of results. So the 
>> penalty should only occur when one starts to use / evaluate the result. 
>> Using maps is a good idea. But I think you'll have to use another algorithm 
>> (not *for*) to get the random access you seek. 
>>
>> Frank could try a *clojure.set/intersection* to find common keys between 
>> the lists. then *order* and *map* / *merge* the 2 lists. 
>>
>> Beyond that, I can't see a scenario where some iteration won't have to 
>> search the space for matching keys (which I think 
>> *clojure.set/intersection* does). A fair point all the same. 
>>
>>
>> Tim Washington 
>> Interruptsoftware.com <http://interruptsoftware.com> 
>>
>>
>> On Sun, Mar 9, 2014 at 12:13 PM, Moritz Ulrich wrote:
>>
>>> I think it would be more efficient to read one of the inputs into a
>>> map for random access instead of iterating it every time.
>>>
>>> On Sun, Mar 9, 2014 at 4:48 PM, Timothy Washington  
>>> wrote:
>>> > Hey Frank,
>>> >
>>> > Try opening up a repl, and running this for comprehension.
>>> >
>>> > (def user_textfile [[:id1 {:name 'Frank'}] [:id3 {:name 'Tim'}]])
>>> > (def user_database [[:id1 {:age 38}] [:id2 {:age 27}] [:id3 {:age 18}] 
>>> [:id4
>>> > {:age 60}]])
>>> >
>>> > (for [i user_textfile
>>> > j user_database
>>> > :when (= (first i) (first j))]
>>> > {(first i) (merge (second i) (second j))})
>>> >
>>> > ({:id1 {:age 38, :name Frank'}} {:id3 {:age 18, :name Tim'}})  ;; 
>>> result
>>> > from repl
>>> >
>>> >
>>> >
>>> > Hth
>>> >
>>> > Tim Washington
>>> > Interruptsoftware.com
>>> >
>>> >
>>> > On Sun, Mar 9, 2014 at 5:33 AM, Frank Behrens  
>>> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> i'm investigating if clojure can be used to solve the challenges and
>>> >> problems we have at my day job better than ruby or powershell. A very 
>>> common
>>> >> use case is validating data from different  systems against some 
>>> criteria. i
>>> >> believe clojure can be our silver bullet, but before that, it seems 
>>> to be
>>> >> required to wrap my head around it.
>>> >>
>>> >> So I am starting in the first level with the challenge to validate 
>>> some
>>> >> data from the user database against our active directory.
>>> >>
>>> >> I already have all the parts to make it work: Which is to make a hash 
>>> by
>>> >> user_id from the database table, export a textfile from AD, each line
>>> >> representing a user, parse it, merge the information from the
>>> >> user_table_hash, and voila.
>>> >>
>>> >> I did not finish to implement this. So I don't know if this naive 
>>> approach
>>> >> will work with 400.000 r

Re: Working with big datasets, merging two ordered lists by key

2014-03-10 Thread Frank Behrens
Thanks for your suggestions. 
a for loop has to do  100.000 * 300.000 compares
Storing the database table into a 300.000 element hash, would be a memory 
penalty I want to avoid.

I'm quite shure that assential part of the solution is a function to 
iterate through both list at once,
spitting out pairs of values according to compare

(merge-sortedlists 
  '(1 2 3)
  '(   24))
=> ([1 nil] [2 2] [3 nil] [nil 4])

Seems quite doable.
Try to implement now.

Frank


Am Montag, 10. März 2014 01:23:57 UTC+1 schrieb frye:
>
> Hmm, the *for* comprehension yields a lazy sequence of results. So the 
> penalty should only occur when one starts to use / evaluate the result. 
> Using maps is a good idea. But I think you'll have to use another algorithm 
> (not *for*) to get the random access you seek. 
>
> Frank could try a *clojure.set/intersection* to find common keys between 
> the lists. then *order* and *map* / *merge* the 2 lists. 
>
> Beyond that, I can't see a scenario where some iteration won't have to 
> search the space for matching keys (which I think 
> *clojure.set/intersection* does). A fair point all the same. 
>
>
> Tim Washington 
> Interruptsoftware.com <http://interruptsoftware.com> 
>
>
> On Sun, Mar 9, 2014 at 12:13 PM, Moritz Ulrich 
> 
> > wrote:
>
>> I think it would be more efficient to read one of the inputs into a
>> map for random access instead of iterating it every time.
>>
>> On Sun, Mar 9, 2014 at 4:48 PM, Timothy Washington 
>> > 
>> wrote:
>> > Hey Frank,
>> >
>> > Try opening up a repl, and running this for comprehension.
>> >
>> > (def user_textfile [[:id1 {:name 'Frank'}] [:id3 {:name 'Tim'}]])
>> > (def user_database [[:id1 {:age 38}] [:id2 {:age 27}] [:id3 {:age 18}] 
>> [:id4
>> > {:age 60}]])
>> >
>> > (for [i user_textfile
>> > j user_database
>> > :when (= (first i) (first j))]
>> > {(first i) (merge (second i) (second j))})
>> >
>> > ({:id1 {:age 38, :name Frank'}} {:id3 {:age 18, :name Tim'}})  ;; result
>> > from repl
>> >
>> >
>> >
>> > Hth
>> >
>> > Tim Washington
>> > Interruptsoftware.com
>> >
>> >
>> > On Sun, Mar 9, 2014 at 5:33 AM, Frank Behrens 
>> > > 
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> i'm investigating if clojure can be used to solve the challenges and
>> >> problems we have at my day job better than ruby or powershell. A very 
>> common
>> >> use case is validating data from different  systems against some 
>> criteria. i
>> >> believe clojure can be our silver bullet, but before that, it seems to 
>> be
>> >> required to wrap my head around it.
>> >>
>> >> So I am starting in the first level with the challenge to validate some
>> >> data from the user database against our active directory.
>> >>
>> >> I already have all the parts to make it work: Which is to make a hash 
>> by
>> >> user_id from the database table, export a textfile from AD, each line
>> >> representing a user, parse it, merge the information from the
>> >> user_table_hash, and voila.
>> >>
>> >> I did not finish to implement this. So I don't know if this naive 
>> approach
>> >> will work with 400.000 records in the user database and 100.000 in the
>> >> textfile.
>> >> But I already think about how I could implement this in a more memory
>> >> efficient way.
>> >>
>> >> So my simple question:
>> >>
>> >> I have user_textfile (100.000 records) which can be parsed into a
>> >> unordered list of user-maps.
>> >> I have user_table in the database(400.000 record) which I can query 
>> with
>> >> order and gives me an ordered list of user-maps.
>> >>
>> >> So I would first order the user_textfile and then conj the user_table
>> >> ordered list into it, while doing the database query.
>> >> Is that approach right ? How would I then merge the two ordered lists 
>> like
>> >> in the example below?
>> >>
>> >> (defn user_textfile
>> >>   ([:id1 {:name 'Frank'}]
>> >>[:id3 {:name 'Tim'}]))
>> >>
>> >> (defn user_database
>> >>   ([:id1 {:age 38}]
>> >>[:id2 {:age 27}]
>> >

Working with big datasets, merging two ordered lists by key

2014-03-09 Thread Frank Behrens
Hi,

i'm investigating if clojure can be used to solve the challenges and 
problems we have at my day job better than ruby or powershell. A very 
common use case is validating data from different  systems against some 
criteria. i believe clojure can be our silver bullet, but before that, it 
seems to be required to wrap my head around it.

So I am starting in the first level with the challenge to validate some 
data from the user database against our active directory.

I already have all the parts to make it work: Which is to make a hash by 
user_id from the database table, export a textfile from AD, each line 
representing a user, parse it, merge the information from the 
user_table_hash, and voila. 

I did not finish to implement this. So I don't know if this naive approach 
will work with 400.000 records in the user database and 100.000 in the 
textfile.
But I already think about how I could implement this in a more memory 
efficient way.

So my simple question:

I have user_textfile (100.000 records) which can be parsed into a unordered 
list of user-maps.
I have user_table in the database(400.000 record) which I can query with 
order and gives me an ordered list of user-maps.

So I would first order the user_textfile and then conj the user_table 
ordered list into it, while doing the database query.
Is that approach right ? How would I then merge the two ordered lists like 
in the example below?

(defn user_textfile
  ([:id1 {:name 'Frank'}]
   [:id3 {:name 'Tim'}]))  

(defn user_database
  ([:id1 {:age 38}]
   [:id2 {:age 27}]
   [:id3 {:age 18}]
   [:id4 {:age 60}])) 

(merge-sorted-lists user_database user_textfile)
=>
  ([:id1 {:name 'Frank' :age 38}]
   [:id3 {:name 'Tim'   :age 18}]))  

Any feedback is appreciated.
Have a nice day,
Frank

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Getting started - overcoming my first obstacles

2014-03-09 Thread Frank Behrens
Dear Florian (or anybody),

I really like how your post is so beautifully syntax highlighted.
How did you do that ?

nice day !, Frank
(PS this Frank)
 

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.