Re: Diff between object graphs?

2015-04-24 Thread Cem Karan

On Apr 23, 2015, at 11:05 AM, Steve Smaldone smald...@gmail.com wrote:

 On Thu, Apr 23, 2015 at 6:34 AM, Cem Karan cfkar...@gmail.com wrote:
 
 On Apr 23, 2015, at 1:59 AM, Steven D'Aprano 
 steve+comp.lang.pyt...@pearwood.info wrote:
 
  On Thursday 23 April 2015 11:53, Cem Karan wrote:
 
  Precisely.  In order to make my simulations more realistic, I use a lot of
  random numbers.  I can fake things by keeping the seed to the generator,
  but if I want to do any sort of hardware in the loop simulations, then
  that approach won't work.
 
  That's exactly why we have *pseudo* random number generators. They are
  statistically indistinguishable from real randomness, but repeatable when
  needed.
 
 Which is why is why I mentioned keeping the seed above.  The problem is that 
 I eventually want to do hardware in the loop, which will involve IO between 
 the simulation machine and the actual robots, and IO timing is imprecise and 
 uncontrollable.  That is where not recording something becomes lossy.  That 
 said, the mere act of trying to record everything is going to cause timing 
 issues, so I guess I'm over thinking things yet again.
 
 Thanks for the help everyone, its helped me clarify what I need to do in my 
 mind.
 
  
 Well, you could achieve this on Linux by using the rdiff library.  Not 
 exactly a purely Python solution, but it would give you file-based diffs.  
 
 Basically, what you could do is write the first file.  Then for each 
 subsequent saves, write out the file (as a temp file) and issue shell 
 commands (via the Python script) to calculate the diffs of the new file 
 against the first (basis) file.  Once you remove the temp files, you'd have a 
 full first save and a set of diffs against that file.  You could rehydrate 
 any save you want by applying the diff to the basis.
 
 If you work on it a bit, you might even be able to avoid the temp file saves 
 by using pipes in the shell command.
 
 Of course, I haven't tested this so there may be non-obvious issues with 
 diffing between subsequent pickled saves, but it seems that it should work on 
 the surface.

That might work... although I'm running on OS X right now, once I get to the 
hardware in the loop part, it's all going to be some flavor of Linux.  I'll 
look into it... thanks!

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-23 Thread Steve Smaldone
On Thu, Apr 23, 2015 at 6:34 AM, Cem Karan cfkar...@gmail.com wrote:


 On Apr 23, 2015, at 1:59 AM, Steven D'Aprano 
 steve+comp.lang.pyt...@pearwood.info wrote:

  On Thursday 23 April 2015 11:53, Cem Karan wrote:
 
  Precisely.  In order to make my simulations more realistic, I use a lot
 of
  random numbers.  I can fake things by keeping the seed to the generator,
  but if I want to do any sort of hardware in the loop simulations, then
  that approach won't work.
 
  That's exactly why we have *pseudo* random number generators. They are
  statistically indistinguishable from real randomness, but repeatable
 when
  needed.

 Which is why is why I mentioned keeping the seed above.  The problem is
 that I eventually want to do hardware in the loop, which will involve IO
 between the simulation machine and the actual robots, and IO timing is
 imprecise and uncontrollable.  That is where not recording something
 becomes lossy.  That said, the mere act of trying to record everything is
 going to cause timing issues, so I guess I'm over thinking things yet again.

 Thanks for the help everyone, its helped me clarify what I need to do in
 my mind.


Well, you could achieve this on Linux by using the rdiff library.  Not
exactly a purely Python solution, but it would give you file-based diffs.

Basically, what you could do is write the first file.  Then for each
subsequent saves, write out the file (as a temp file) and issue shell
commands (via the Python script) to calculate the diffs of the new file
against the first (basis) file.  Once you remove the temp files, you'd have
a full first save and a set of diffs against that file.  You could
rehydrate any save you want by applying the diff to the basis.

If you work on it a bit, you might even be able to avoid the temp file
saves by using pipes in the shell command.

Of course, I haven't tested this so there may be non-obvious issues with
diffing between subsequent pickled saves, but it seems that it should work
on the surface.

Good luck!

SS
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-23 Thread Steven D'Aprano
On Thursday 23 April 2015 11:53, Cem Karan wrote:

 Precisely.  In order to make my simulations more realistic, I use a lot of
 random numbers.  I can fake things by keeping the seed to the generator,
 but if I want to do any sort of hardware in the loop simulations, then
 that approach won't work.

That's exactly why we have *pseudo* random number generators. They are 
statistically indistinguishable from real randomness, but repeatable when 
needed.

Obviously you need a high-quality PRNG like the Mersenne Twister, as used by 
Python. and you need to ensure that the distribution of values matches that 
of the real-life events. If you are truly paranoid, you might even run the 
simulation twice, using independent PRNGs (e.g. Mersenne Twister for one 
run, Marsaglia xorshift generator for another), and compare the results. But 
given that you are using a high-quality generator in the first place, that 
is unlikely to gain you anything.

(MT is uniformly distributed with no correlations in up to 623 dimensions. I 
suppose it is possible if your simulation involves a phase space with more 
than 623 dimensions, it may inadvertently find correlations in the random 
numbers.)

There's no benefit (except maybe speed, and probably not that) for using 
unrepeatable real random numbers. Using real randomness for simulations is 
a bad idea because it means you can never run the same simulation twice and 
you are forced to store large amounts of data instead of just storing the 
seed then running the simulation again.



-- 
Steve

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-23 Thread Cem Karan

On Apr 23, 2015, at 1:59 AM, Steven D'Aprano 
steve+comp.lang.pyt...@pearwood.info wrote:

 On Thursday 23 April 2015 11:53, Cem Karan wrote:
 
 Precisely.  In order to make my simulations more realistic, I use a lot of
 random numbers.  I can fake things by keeping the seed to the generator,
 but if I want to do any sort of hardware in the loop simulations, then
 that approach won't work.
 
 That's exactly why we have *pseudo* random number generators. They are 
 statistically indistinguishable from real randomness, but repeatable when 
 needed.

Which is why is why I mentioned keeping the seed above.  The problem is that I 
eventually want to do hardware in the loop, which will involve IO between the 
simulation machine and the actual robots, and IO timing is imprecise and 
uncontrollable.  That is where not recording something becomes lossy.  That 
said, the mere act of trying to record everything is going to cause timing 
issues, so I guess I'm over thinking things yet again.

Thanks for the help everyone, its helped me clarify what I need to do in my 
mind.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-22 Thread CFK
On Wed, Apr 22, 2015 at 8:11 AM, Rustom Mody rustompm...@gmail.com wrote:

 On Wednesday, April 22, 2015 at 4:07:35 PM UTC+5:30, Cem Karan wrote:
  Hi all, I need some help.  I'm working on a simple event-based simulator
 for my dissertation research. The simulator has state information that I
 want to analyze as a post-simulation step, so I currently save (pickle) the
 entire simulator every time an event occurs; this lets me analyze the
 simulation at any moment in time, and ask questions that I haven't thought
 of yet.  The problem is that pickling this amount of data is both
 time-consuming and a space hog.  This is true even when using bz2.open() to
 create a compressed file on the fly.

 No answer to your questions...
 But you do know that bzip is rather worse than gzip in time
 and not really so much better in space dont you??
 http://tukaani.org/lzma/benchmarks.html


I had no idea, I'll try my tests using gzip as well, just to see.

That said, I could still use the diff between object graphs; saving less
state is definitely going to be a speed/space improvement over saving
everything!

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-22 Thread Peter Otten
Cem Karan wrote:

 Hi all, I need some help.  I'm working on a simple event-based simulator
 for my dissertation research. The simulator has state information that I
 want to analyze as a post-simulation step, so I currently save (pickle)
 the entire simulator every time an event occurs; this lets me analyze the
 simulation at any moment in time, and ask questions that I haven't thought
 of yet.  The problem is that pickling this amount of data is both
 time-consuming and a space hog.  This is true even when using bz2.open()
 to create a compressed file on the fly.
 
 This leaves me with two choices; first, pick the data I want to save, and
 second, find a way of generating diffs between object graphs.  Since I
 don't yet know all the questions I want to ask, I don't want to throw away
 information prematurely, which is why I would prefer to avoid scenario 1.
 
 So that brings up possibility two; generating diffs between object graphs.
  I've searched around in the standard library and on pypi, but I haven't
 yet found a library that does what I want.  Does anyone know of something
 that does?
 
 Basically, I want something with the following ability:
 
 Object_graph_2 - Object_graph_1 = diff_2_1
 Object_graph_1 + diff_2_1 = Object_graph_2
 
 The object graphs are already pickleable, and the diffs must be, or this
 won't work.  I can use deepcopy to ensure the two object graphs are
 completely separate, so the diffing engine doesn't need to worry about
 that part.
 
 Anyone know of such a thing?

A poor man's approach:

Do not compress the pickled data, check it into version control. Getting the 
n-th state then becomes checking out the n-th revision of the file.

I have no idea how much space you save that way, but it's simple enough to 
give it a try.


Another slightly more involved idea:

Make the events pickleable, and save the simulator only for every 100th (for 
example) event. To restore the 7531th state load pickle 7500 and apply 
events 7501 to 7531.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-22 Thread Cem Karan

On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote:

 Cem Karan wrote:
 
 Hi all, I need some help.  I'm working on a simple event-based simulator
 for my dissertation research. The simulator has state information that I
 want to analyze as a post-simulation step, so I currently save (pickle)
 the entire simulator every time an event occurs; this lets me analyze the
 simulation at any moment in time, and ask questions that I haven't thought
 of yet.  The problem is that pickling this amount of data is both
 time-consuming and a space hog.  This is true even when using bz2.open()
 to create a compressed file on the fly.
 
 This leaves me with two choices; first, pick the data I want to save, and
 second, find a way of generating diffs between object graphs.  Since I
 don't yet know all the questions I want to ask, I don't want to throw away
 information prematurely, which is why I would prefer to avoid scenario 1.
 
 So that brings up possibility two; generating diffs between object graphs.
 I've searched around in the standard library and on pypi, but I haven't
 yet found a library that does what I want.  Does anyone know of something
 that does?
 
 Basically, I want something with the following ability:
 
 Object_graph_2 - Object_graph_1 = diff_2_1
 Object_graph_1 + diff_2_1 = Object_graph_2
 
 The object graphs are already pickleable, and the diffs must be, or this
 won't work.  I can use deepcopy to ensure the two object graphs are
 completely separate, so the diffing engine doesn't need to worry about
 that part.
 
 Anyone know of such a thing?
 
 A poor man's approach:
 
 Do not compress the pickled data, check it into version control. Getting the 
 n-th state then becomes checking out the n-th revision of the file.
 
 I have no idea how much space you save that way, but it's simple enough to 
 give it a try.

Sounds like a good approach, I'll give it a shot in the morning.

 Another slightly more involved idea:
 
 Make the events pickleable, and save the simulator only for every 100th (for 
 example) event. To restore the 7531th state load pickle 7500 and apply 
 events 7501 to 7531.

I was hoping to avoid doing this as I lose information.  BUT, its likely that 
this will be the best approach regardless of what other methods I use; there is 
just too much data.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-22 Thread Cem Karan

On Apr 22, 2015, at 9:56 PM, Dave Angel da...@davea.name wrote:

 On 04/22/2015 09:46 PM, Chris Angelico wrote:
 On Thu, Apr 23, 2015 at 11:37 AM, Dave Angel da...@davea.name wrote:
 On 04/22/2015 09:30 PM, Cem Karan wrote:
 
 
 On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote:
 
 Another slightly more involved idea:
 
 Make the events pickleable, and save the simulator only for every 100th
 (for
 example) event. To restore the 7531th state load pickle 7500 and apply
 events 7501 to 7531.
 
 
 I was hoping to avoid doing this as I lose information.  BUT, its likely
 that this will be the best approach regardless of what other methods I use;
 there is just too much data.
 
 
 Why would that lose any information???
 
 It loses information if event processing isn't perfectly deterministic.
 
 Quite right.  But I hadn't seen anything in this thread to imply that.

My apologies, that's my fault.  I should have mentioned that in the first place.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-22 Thread Dave Angel

On 04/22/2015 09:30 PM, Cem Karan wrote:


On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote:


Another slightly more involved idea:

Make the events pickleable, and save the simulator only for every 100th (for
example) event. To restore the 7531th state load pickle 7500 and apply
events 7501 to 7531.


I was hoping to avoid doing this as I lose information.  BUT, its likely that 
this will be the best approach regardless of what other methods I use; there is 
just too much data.



Why would that lose any information???


--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-22 Thread Cem Karan

On Apr 22, 2015, at 9:46 PM, Chris Angelico ros...@gmail.com wrote:

 On Thu, Apr 23, 2015 at 11:37 AM, Dave Angel da...@davea.name wrote:
 On 04/22/2015 09:30 PM, Cem Karan wrote:
 
 
 On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote:
 
 Another slightly more involved idea:
 
 Make the events pickleable, and save the simulator only for every 100th
 (for
 example) event. To restore the 7531th state load pickle 7500 and apply
 events 7501 to 7531.
 
 
 I was hoping to avoid doing this as I lose information.  BUT, its likely
 that this will be the best approach regardless of what other methods I use;
 there is just too much data.
 
 
 Why would that lose any information???
 
 It loses information if event processing isn't perfectly deterministic.

Precisely.  In order to make my simulations more realistic, I use a lot of 
random numbers.  I can fake things by keeping the seed to the generator, but if 
I want to do any sort of hardware in the loop simulations, then that approach 
won't work.

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-22 Thread Dave Angel

On 04/22/2015 09:46 PM, Chris Angelico wrote:

On Thu, Apr 23, 2015 at 11:37 AM, Dave Angel da...@davea.name wrote:

On 04/22/2015 09:30 PM, Cem Karan wrote:



On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote:


Another slightly more involved idea:

Make the events pickleable, and save the simulator only for every 100th
(for
example) event. To restore the 7531th state load pickle 7500 and apply
events 7501 to 7531.



I was hoping to avoid doing this as I lose information.  BUT, its likely
that this will be the best approach regardless of what other methods I use;
there is just too much data.



Why would that lose any information???


It loses information if event processing isn't perfectly deterministic.


Quite right.  But I hadn't seen anything in this thread to imply that.

I used an approach like that on the Game of Life, in 1976.  I saved 
every 10th or so state, and was able to run the simulation backwards by 
going forward from the previous saved state.  In this case, the analogue 
of the event is determined from the previous state.  But it's quite 
similar, and quite deterministic.




--
DaveA
--
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-22 Thread Chris Angelico
On Thu, Apr 23, 2015 at 11:37 AM, Dave Angel da...@davea.name wrote:
 On 04/22/2015 09:30 PM, Cem Karan wrote:


 On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote:

 Another slightly more involved idea:

 Make the events pickleable, and save the simulator only for every 100th
 (for
 example) event. To restore the 7531th state load pickle 7500 and apply
 events 7501 to 7531.


 I was hoping to avoid doing this as I lose information.  BUT, its likely
 that this will be the best approach regardless of what other methods I use;
 there is just too much data.


 Why would that lose any information???

It loses information if event processing isn't perfectly deterministic.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Diff between object graphs?

2015-04-22 Thread Cem Karan
Hi all, I need some help.  I'm working on a simple event-based simulator for my 
dissertation research. The simulator has state information that I want to 
analyze as a post-simulation step, so I currently save (pickle) the entire 
simulator every time an event occurs; this lets me analyze the simulation at 
any moment in time, and ask questions that I haven't thought of yet.  The 
problem is that pickling this amount of data is both time-consuming and a space 
hog.  This is true even when using bz2.open() to create a compressed file on 
the fly.  

This leaves me with two choices; first, pick the data I want to save, and 
second, find a way of generating diffs between object graphs.  Since I don't 
yet know all the questions I want to ask, I don't want to throw away 
information prematurely, which is why I would prefer to avoid scenario 1.  

So that brings up possibility two; generating diffs between object graphs.  
I've searched around in the standard library and on pypi, but I haven't yet 
found a library that does what I want.  Does anyone know of something that does?

Basically, I want something with the following ability:

Object_graph_2 - Object_graph_1 = diff_2_1
Object_graph_1 + diff_2_1 = Object_graph_2

The object graphs are already pickleable, and the diffs must be, or this won't 
work.  I can use deepcopy to ensure the two object graphs are completely 
separate, so the diffing engine doesn't need to worry about that part.

Anyone know of such a thing?

Thanks,
Cem Karan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Diff between object graphs?

2015-04-22 Thread Rustom Mody
On Wednesday, April 22, 2015 at 4:07:35 PM UTC+5:30, Cem Karan wrote:
 Hi all, I need some help.  I'm working on a simple event-based simulator for 
 my dissertation research. The simulator has state information that I want to 
 analyze as a post-simulation step, so I currently save (pickle) the entire 
 simulator every time an event occurs; this lets me analyze the simulation at 
 any moment in time, and ask questions that I haven't thought of yet.  The 
 problem is that pickling this amount of data is both time-consuming and a 
 space hog.  This is true even when using bz2.open() to create a compressed 
 file on the fly.  

No answer to your questions...
But you do know that bzip is rather worse than gzip in time 
and not really so much better in space dont you?? 
http://tukaani.org/lzma/benchmarks.html
-- 
https://mail.python.org/mailman/listinfo/python-list