Re: Architectural doubts

2015-02-01 Thread Milton Silva
For now I'm hand coding a big loop with transients and a bytebuffer. It is 
an incredibly ugly imperative mess but, it is fast enough and so far it 
fits nicely into ram.
It takes 2s (at this point gloss was taking upwards of 5 minutes) to decode 
200MB all the way to tcp and uses 500MB of heap. This is faster than 
wireshark(10s) but a bit more memory hungry. Will see how it responds when 
decoding diameter, so far that part is stored as byte-arrays.

Thank you for the pointers with datomic. Are the indexes created 
automatically or do I need to specify them on the schema?

  

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Architectural doubts

2015-02-01 Thread adrian . medina
If you're interested in efficient binary decoding/encoding with a more 
pleasant API than standard NIO ByteBuffers, check out Netty's buffer 
package, io.netty.buffer (http://netty.io/5.0/api/index.html). 

On Sunday, February 1, 2015 at 7:31:37 AM UTC-5, Milton Silva wrote:

 For now I'm hand coding a big loop with transients and a bytebuffer. It is 
 an incredibly ugly imperative mess but, it is fast enough and so far it 
 fits nicely into ram.
 It takes 2s (at this point gloss was taking upwards of 5 minutes) to 
 decode 200MB all the way to tcp and uses 500MB of heap. This is faster than 
 wireshark(10s) but a bit more memory hungry. Will see how it responds when 
 decoding diameter, so far that part is stored as byte-arrays.

 Thank you for the pointers with datomic. Are the indexes created 
 automatically or do I need to specify them on the schema?

  

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Architectural doubts

2015-02-01 Thread Ashton Kemerling
Because of the way that Datomic stores it's data (5 tuples: entity, 
attribute, value, transaction, and operation) it has some pretty simple 
indexes that allow for powerful queries. EAVT allows for rapidly searching 
by entity, VAET allows for quick reverse look ups, etc. The only index that 
you get to choose on is the AVET index, that allows for searching across an 
attribute (column level search in the relational world), and that's toggled 
by the :db/index attribute on the schema. Which index a query uses depends 
on what you're trying to do with datalog, it's pretty much out of the users 
hands unless if you start using the raw index API.

The current word from the Datomic team is that they highly recommend 
leaving the index on for all attributes, as it consumes small enough 
amounts of data in most scenarios. Datomic is also pretty good at keeping 
only useful indexes in memory, so you should probably be fine.

--
Ashton

On Sunday, February 1, 2015 at 5:31:37 AM UTC-7, Milton Silva wrote:

 For now I'm hand coding a big loop with transients and a bytebuffer. It is 
 an incredibly ugly imperative mess but, it is fast enough and so far it 
 fits nicely into ram.
 It takes 2s (at this point gloss was taking upwards of 5 minutes) to 
 decode 200MB all the way to tcp and uses 500MB of heap. This is faster than 
 wireshark(10s) but a bit more memory hungry. Will see how it responds when 
 decoding diameter, so far that part is stored as byte-arrays.

 Thank you for the pointers with datomic. Are the indexes created 
 automatically or do I need to specify them on the schema?

  

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Architectural doubts

2015-01-31 Thread Timothy Baldridge
Since the data you are handing to the datomic query engine is un-indexed,
portions of the query will effectively be O(n). However if you do as Jan
suggests and put the data into Datomic the data will automatically indexed
several ways. Then when you query against the datomic db, the engine will
pick up on these indexes and your queries could be much faster. In
addition, Datomic has some rather advanced caching logic that should help
with data usage if you are writing the data to a transactor (i.e. using the
in-memory storage won't help much here, use free or dev storage).

Timothy



On Sat, Jan 31, 2015 at 5:53 AM, Jan-Paul Bultmann 
janpaulbultm...@googlemail.com wrote:

 Why not stream frames directly into the datomic db as they fall out of
 gloss?
 This should be faster at query time anyhow due to indexes,
 and let's datomic handle the memory management.

 cheers Jan

 On 31 Jan 2015, at 11:39, Milton Silva milton...@gmail.com wrote:

 While using wireshark to analyse libpcap files (= 200 MB) I routinely
 think that it would be great to preform relational queries but, wireshark
 only supports search.

 I thought I would decode the entire file, hold it in memory as clojure
 data structures and use datomic's datalog.

 Besides relational queries, a requirement is for the file to be decoded
 (libpcap, ethernet-II, IP, TCP, diameter) in less then a minute(for a
 200MB) and the typical queries should also be less than a minute.

 I thought the frames could be represented like this:

 {:frame-id 1
 :timestamp java's instant-object
 :src-mac string
 :dest-mac string
 :src-ip
 :dest-ip ...
 ...}

 {:frame-ids [1 3]
 :diameter-session-id ...}

 So, I started by using gloss to decode a 200MB file. Gloss is fantastic to
 specify frames  but, it is not meeting the time requirements. It appear the
 problem has to do with the creation of a lot of objects. Even with 3G of
 ram for the heap, it still crawls to a halt.

 I could try to perform some experiments to determine approximate answers
 but, I think it is better to talk with people with more experience in order
 to avoid common pitfalls..
 My questions are:

 Will the JVM (with 3G) support a million hashmaps like the above?
 Is Buffy able to do something like what I want?
 Will datomic be able to handle this use case?
 What would you suggest to solve this(e.g. don't use clojure data
 structures.. but then datomic's datalog is not available to query?)?

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

  --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.




-- 
“One of the main causes of the fall of the Roman Empire was that–lacking
zero–they had no way to indicate successful termination of their C
programs.”
(Robert Firth)

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Architectural doubts

2015-01-31 Thread John Wiseman
One trick I've used to speed up use of gloss is to use a lazy map so I only
actually parse a piece of data if I need to.  In my particular application
this was an easy optimization because I was parsing a tagged data format,
and I could do minimal parsing just to determine the tag of the next chunk
of data, and then build a lazy map entry that used a gloss codec specific
to that tag.  I was also lucky that in most cases only a few tags are of
interest to anyone, so I was able to avoid lots of parsing overhead.

Without this optimization gloss might have been too slow for my purposes,
which is unfortunate.


On Sat, Jan 31, 2015 at 6:52 AM, Timothy Baldridge tbaldri...@gmail.com
wrote:

 Since the data you are handing to the datomic query engine is un-indexed,
 portions of the query will effectively be O(n). However if you do as Jan
 suggests and put the data into Datomic the data will automatically indexed
 several ways. Then when you query against the datomic db, the engine will
 pick up on these indexes and your queries could be much faster. In
 addition, Datomic has some rather advanced caching logic that should help
 with data usage if you are writing the data to a transactor (i.e. using the
 in-memory storage won't help much here, use free or dev storage).

 Timothy



 On Sat, Jan 31, 2015 at 5:53 AM, Jan-Paul Bultmann 
 janpaulbultm...@googlemail.com wrote:

 Why not stream frames directly into the datomic db as they fall out of
 gloss?
 This should be faster at query time anyhow due to indexes,
 and let's datomic handle the memory management.

 cheers Jan

 On 31 Jan 2015, at 11:39, Milton Silva milton...@gmail.com wrote:

 While using wireshark to analyse libpcap files (= 200 MB) I routinely
 think that it would be great to preform relational queries but, wireshark
 only supports search.

 I thought I would decode the entire file, hold it in memory as clojure
 data structures and use datomic's datalog.

 Besides relational queries, a requirement is for the file to be decoded
 (libpcap, ethernet-II, IP, TCP, diameter) in less then a minute(for a
 200MB) and the typical queries should also be less than a minute.

 I thought the frames could be represented like this:

 {:frame-id 1
 :timestamp java's instant-object
 :src-mac string
 :dest-mac string
 :src-ip
 :dest-ip ...
 ...}

 {:frame-ids [1 3]
 :diameter-session-id ...}

 So, I started by using gloss to decode a 200MB file. Gloss is
 fantastic to specify frames  but, it is not meeting the time requirements.
 It appear the problem has to do with the creation of a lot of objects. Even
 with 3G of ram for the heap, it still crawls to a halt.

 I could try to perform some experiments to determine approximate answers
 but, I think it is better to talk with people with more experience in order
 to avoid common pitfalls..
 My questions are:

 Will the JVM (with 3G) support a million hashmaps like the above?
 Is Buffy able to do something like what I want?
 Will datomic be able to handle this use case?
 What would you suggest to solve this(e.g. don't use clojure data
 structures.. but then datomic's datalog is not available to query?)?

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

  --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.




 --
 “One of the main causes of the fall of the Roman Empire was that–lacking
 zero–they had no way to indicate successful termination of their C
 programs.”
 (Robert Firth)

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from 

Architectural doubts

2015-01-31 Thread Milton Silva
While using wireshark to analyse libpcap files (= 200 MB) I routinely 
think that it would be great to preform relational queries but, wireshark 
only supports search. 

I thought I would decode the entire file, hold it in memory as clojure data 
structures and use datomic's datalog.

Besides relational queries, a requirement is for the file to be decoded 
(libpcap, ethernet-II, IP, TCP, diameter) in less then a minute(for a 
200MB) and the typical queries should also be less than a minute.

I thought the frames could be represented like this:

{:frame-id 1
:timestamp java's instant-object
:src-mac string
:dest-mac string
:src-ip
:dest-ip ...
...}

{:frame-ids [1 3]
:diameter-session-id ...}

So, I started by using gloss to decode a 200MB file. Gloss is fantastic to 
specify frames  but, it is not meeting the time requirements. It appear the 
problem has to do with the creation of a lot of objects. Even with 3G of 
ram for the heap, it still crawls to a halt.

I could try to perform some experiments to determine approximate answers 
but, I think it is better to talk with people with more experience in order 
to avoid common pitfalls..
My questions are:

Will the JVM (with 3G) support a million hashmaps like the above?
Is Buffy able to do something like what I want? 
Will datomic be able to handle this use case? 
What would you suggest to solve this(e.g. don't use clojure data 
structures.. but then datomic's datalog is not available to query?)?

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Architectural doubts

2015-01-31 Thread Jan-Paul Bultmann
Why not stream frames directly into the datomic db as they fall out of gloss?
This should be faster at query time anyhow due to indexes,
and let's datomic handle the memory management.

cheers Jan

 On 31 Jan 2015, at 11:39, Milton Silva milton...@gmail.com wrote:
 
 While using wireshark to analyse libpcap files (= 200 MB) I routinely think 
 that it would be great to preform relational queries but, wireshark only 
 supports search. 
 
 I thought I would decode the entire file, hold it in memory as clojure data 
 structures and use datomic's datalog.
 
 Besides relational queries, a requirement is for the file to be decoded 
 (libpcap, ethernet-II, IP, TCP, diameter) in less then a minute(for a 200MB) 
 and the typical queries should also be less than a minute.
 
 I thought the frames could be represented like this:
 
 {:frame-id 1
 :timestamp java's instant-object
 :src-mac string
 :dest-mac string
 :src-ip
 :dest-ip ...
 ...}
 
 {:frame-ids [1 3]
 :diameter-session-id ...}
 
 So, I started by using gloss to decode a 200MB file. Gloss is fantastic to 
 specify frames  but, it is not meeting the time requirements. It appear the 
 problem has to do with the creation of a lot of objects. Even with 3G of ram 
 for the heap, it still crawls to a halt.
 
 I could try to perform some experiments to determine approximate answers but, 
 I think it is better to talk with people with more experience in order to 
 avoid common pitfalls..
 My questions are:
 
 Will the JVM (with 3G) support a million hashmaps like the above?
 Is Buffy able to do something like what I want? 
 Will datomic be able to handle this use case? 
 What would you suggest to solve this(e.g. don't use clojure data structures.. 
 but then datomic's datalog is not available to query?)?
 -- 
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with your 
 first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 --- 
 You received this message because you are subscribed to the Google Groups 
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.