Re: Diff between object graphs?
On Apr 23, 2015, at 11:05 AM, Steve Smaldone smald...@gmail.com wrote: On Thu, Apr 23, 2015 at 6:34 AM, Cem Karan cfkar...@gmail.com wrote: On Apr 23, 2015, at 1:59 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thursday 23 April 2015 11:53, Cem Karan wrote: Precisely. In order to make my simulations more realistic, I use a lot of random numbers. I can fake things by keeping the seed to the generator, but if I want to do any sort of hardware in the loop simulations, then that approach won't work. That's exactly why we have *pseudo* random number generators. They are statistically indistinguishable from real randomness, but repeatable when needed. Which is why is why I mentioned keeping the seed above. The problem is that I eventually want to do hardware in the loop, which will involve IO between the simulation machine and the actual robots, and IO timing is imprecise and uncontrollable. That is where not recording something becomes lossy. That said, the mere act of trying to record everything is going to cause timing issues, so I guess I'm over thinking things yet again. Thanks for the help everyone, its helped me clarify what I need to do in my mind. Well, you could achieve this on Linux by using the rdiff library. Not exactly a purely Python solution, but it would give you file-based diffs. Basically, what you could do is write the first file. Then for each subsequent saves, write out the file (as a temp file) and issue shell commands (via the Python script) to calculate the diffs of the new file against the first (basis) file. Once you remove the temp files, you'd have a full first save and a set of diffs against that file. You could rehydrate any save you want by applying the diff to the basis. If you work on it a bit, you might even be able to avoid the temp file saves by using pipes in the shell command. Of course, I haven't tested this so there may be non-obvious issues with diffing between subsequent pickled saves, but it seems that it should work on the surface. That might work... although I'm running on OS X right now, once I get to the hardware in the loop part, it's all going to be some flavor of Linux. I'll look into it... thanks! Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Thu, Apr 23, 2015 at 6:34 AM, Cem Karan cfkar...@gmail.com wrote: On Apr 23, 2015, at 1:59 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thursday 23 April 2015 11:53, Cem Karan wrote: Precisely. In order to make my simulations more realistic, I use a lot of random numbers. I can fake things by keeping the seed to the generator, but if I want to do any sort of hardware in the loop simulations, then that approach won't work. That's exactly why we have *pseudo* random number generators. They are statistically indistinguishable from real randomness, but repeatable when needed. Which is why is why I mentioned keeping the seed above. The problem is that I eventually want to do hardware in the loop, which will involve IO between the simulation machine and the actual robots, and IO timing is imprecise and uncontrollable. That is where not recording something becomes lossy. That said, the mere act of trying to record everything is going to cause timing issues, so I guess I'm over thinking things yet again. Thanks for the help everyone, its helped me clarify what I need to do in my mind. Well, you could achieve this on Linux by using the rdiff library. Not exactly a purely Python solution, but it would give you file-based diffs. Basically, what you could do is write the first file. Then for each subsequent saves, write out the file (as a temp file) and issue shell commands (via the Python script) to calculate the diffs of the new file against the first (basis) file. Once you remove the temp files, you'd have a full first save and a set of diffs against that file. You could rehydrate any save you want by applying the diff to the basis. If you work on it a bit, you might even be able to avoid the temp file saves by using pipes in the shell command. Of course, I haven't tested this so there may be non-obvious issues with diffing between subsequent pickled saves, but it seems that it should work on the surface. Good luck! SS -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Thursday 23 April 2015 11:53, Cem Karan wrote: Precisely. In order to make my simulations more realistic, I use a lot of random numbers. I can fake things by keeping the seed to the generator, but if I want to do any sort of hardware in the loop simulations, then that approach won't work. That's exactly why we have *pseudo* random number generators. They are statistically indistinguishable from real randomness, but repeatable when needed. Obviously you need a high-quality PRNG like the Mersenne Twister, as used by Python. and you need to ensure that the distribution of values matches that of the real-life events. If you are truly paranoid, you might even run the simulation twice, using independent PRNGs (e.g. Mersenne Twister for one run, Marsaglia xorshift generator for another), and compare the results. But given that you are using a high-quality generator in the first place, that is unlikely to gain you anything. (MT is uniformly distributed with no correlations in up to 623 dimensions. I suppose it is possible if your simulation involves a phase space with more than 623 dimensions, it may inadvertently find correlations in the random numbers.) There's no benefit (except maybe speed, and probably not that) for using unrepeatable real random numbers. Using real randomness for simulations is a bad idea because it means you can never run the same simulation twice and you are forced to store large amounts of data instead of just storing the seed then running the simulation again. -- Steve -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Apr 23, 2015, at 1:59 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thursday 23 April 2015 11:53, Cem Karan wrote: Precisely. In order to make my simulations more realistic, I use a lot of random numbers. I can fake things by keeping the seed to the generator, but if I want to do any sort of hardware in the loop simulations, then that approach won't work. That's exactly why we have *pseudo* random number generators. They are statistically indistinguishable from real randomness, but repeatable when needed. Which is why is why I mentioned keeping the seed above. The problem is that I eventually want to do hardware in the loop, which will involve IO between the simulation machine and the actual robots, and IO timing is imprecise and uncontrollable. That is where not recording something becomes lossy. That said, the mere act of trying to record everything is going to cause timing issues, so I guess I'm over thinking things yet again. Thanks for the help everyone, its helped me clarify what I need to do in my mind. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Wed, Apr 22, 2015 at 8:11 AM, Rustom Mody rustompm...@gmail.com wrote: On Wednesday, April 22, 2015 at 4:07:35 PM UTC+5:30, Cem Karan wrote: Hi all, I need some help. I'm working on a simple event-based simulator for my dissertation research. The simulator has state information that I want to analyze as a post-simulation step, so I currently save (pickle) the entire simulator every time an event occurs; this lets me analyze the simulation at any moment in time, and ask questions that I haven't thought of yet. The problem is that pickling this amount of data is both time-consuming and a space hog. This is true even when using bz2.open() to create a compressed file on the fly. No answer to your questions... But you do know that bzip is rather worse than gzip in time and not really so much better in space dont you?? http://tukaani.org/lzma/benchmarks.html I had no idea, I'll try my tests using gzip as well, just to see. That said, I could still use the diff between object graphs; saving less state is definitely going to be a speed/space improvement over saving everything! Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
Cem Karan wrote: Hi all, I need some help. I'm working on a simple event-based simulator for my dissertation research. The simulator has state information that I want to analyze as a post-simulation step, so I currently save (pickle) the entire simulator every time an event occurs; this lets me analyze the simulation at any moment in time, and ask questions that I haven't thought of yet. The problem is that pickling this amount of data is both time-consuming and a space hog. This is true even when using bz2.open() to create a compressed file on the fly. This leaves me with two choices; first, pick the data I want to save, and second, find a way of generating diffs between object graphs. Since I don't yet know all the questions I want to ask, I don't want to throw away information prematurely, which is why I would prefer to avoid scenario 1. So that brings up possibility two; generating diffs between object graphs. I've searched around in the standard library and on pypi, but I haven't yet found a library that does what I want. Does anyone know of something that does? Basically, I want something with the following ability: Object_graph_2 - Object_graph_1 = diff_2_1 Object_graph_1 + diff_2_1 = Object_graph_2 The object graphs are already pickleable, and the diffs must be, or this won't work. I can use deepcopy to ensure the two object graphs are completely separate, so the diffing engine doesn't need to worry about that part. Anyone know of such a thing? A poor man's approach: Do not compress the pickled data, check it into version control. Getting the n-th state then becomes checking out the n-th revision of the file. I have no idea how much space you save that way, but it's simple enough to give it a try. Another slightly more involved idea: Make the events pickleable, and save the simulator only for every 100th (for example) event. To restore the 7531th state load pickle 7500 and apply events 7501 to 7531. -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote: Cem Karan wrote: Hi all, I need some help. I'm working on a simple event-based simulator for my dissertation research. The simulator has state information that I want to analyze as a post-simulation step, so I currently save (pickle) the entire simulator every time an event occurs; this lets me analyze the simulation at any moment in time, and ask questions that I haven't thought of yet. The problem is that pickling this amount of data is both time-consuming and a space hog. This is true even when using bz2.open() to create a compressed file on the fly. This leaves me with two choices; first, pick the data I want to save, and second, find a way of generating diffs between object graphs. Since I don't yet know all the questions I want to ask, I don't want to throw away information prematurely, which is why I would prefer to avoid scenario 1. So that brings up possibility two; generating diffs between object graphs. I've searched around in the standard library and on pypi, but I haven't yet found a library that does what I want. Does anyone know of something that does? Basically, I want something with the following ability: Object_graph_2 - Object_graph_1 = diff_2_1 Object_graph_1 + diff_2_1 = Object_graph_2 The object graphs are already pickleable, and the diffs must be, or this won't work. I can use deepcopy to ensure the two object graphs are completely separate, so the diffing engine doesn't need to worry about that part. Anyone know of such a thing? A poor man's approach: Do not compress the pickled data, check it into version control. Getting the n-th state then becomes checking out the n-th revision of the file. I have no idea how much space you save that way, but it's simple enough to give it a try. Sounds like a good approach, I'll give it a shot in the morning. Another slightly more involved idea: Make the events pickleable, and save the simulator only for every 100th (for example) event. To restore the 7531th state load pickle 7500 and apply events 7501 to 7531. I was hoping to avoid doing this as I lose information. BUT, its likely that this will be the best approach regardless of what other methods I use; there is just too much data. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Apr 22, 2015, at 9:56 PM, Dave Angel da...@davea.name wrote: On 04/22/2015 09:46 PM, Chris Angelico wrote: On Thu, Apr 23, 2015 at 11:37 AM, Dave Angel da...@davea.name wrote: On 04/22/2015 09:30 PM, Cem Karan wrote: On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote: Another slightly more involved idea: Make the events pickleable, and save the simulator only for every 100th (for example) event. To restore the 7531th state load pickle 7500 and apply events 7501 to 7531. I was hoping to avoid doing this as I lose information. BUT, its likely that this will be the best approach regardless of what other methods I use; there is just too much data. Why would that lose any information??? It loses information if event processing isn't perfectly deterministic. Quite right. But I hadn't seen anything in this thread to imply that. My apologies, that's my fault. I should have mentioned that in the first place. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On 04/22/2015 09:30 PM, Cem Karan wrote: On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote: Another slightly more involved idea: Make the events pickleable, and save the simulator only for every 100th (for example) event. To restore the 7531th state load pickle 7500 and apply events 7501 to 7531. I was hoping to avoid doing this as I lose information. BUT, its likely that this will be the best approach regardless of what other methods I use; there is just too much data. Why would that lose any information??? -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Apr 22, 2015, at 9:46 PM, Chris Angelico ros...@gmail.com wrote: On Thu, Apr 23, 2015 at 11:37 AM, Dave Angel da...@davea.name wrote: On 04/22/2015 09:30 PM, Cem Karan wrote: On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote: Another slightly more involved idea: Make the events pickleable, and save the simulator only for every 100th (for example) event. To restore the 7531th state load pickle 7500 and apply events 7501 to 7531. I was hoping to avoid doing this as I lose information. BUT, its likely that this will be the best approach regardless of what other methods I use; there is just too much data. Why would that lose any information??? It loses information if event processing isn't perfectly deterministic. Precisely. In order to make my simulations more realistic, I use a lot of random numbers. I can fake things by keeping the seed to the generator, but if I want to do any sort of hardware in the loop simulations, then that approach won't work. Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On 04/22/2015 09:46 PM, Chris Angelico wrote: On Thu, Apr 23, 2015 at 11:37 AM, Dave Angel da...@davea.name wrote: On 04/22/2015 09:30 PM, Cem Karan wrote: On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote: Another slightly more involved idea: Make the events pickleable, and save the simulator only for every 100th (for example) event. To restore the 7531th state load pickle 7500 and apply events 7501 to 7531. I was hoping to avoid doing this as I lose information. BUT, its likely that this will be the best approach regardless of what other methods I use; there is just too much data. Why would that lose any information??? It loses information if event processing isn't perfectly deterministic. Quite right. But I hadn't seen anything in this thread to imply that. I used an approach like that on the Game of Life, in 1976. I saved every 10th or so state, and was able to run the simulation backwards by going forward from the previous saved state. In this case, the analogue of the event is determined from the previous state. But it's quite similar, and quite deterministic. -- DaveA -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Thu, Apr 23, 2015 at 11:37 AM, Dave Angel da...@davea.name wrote: On 04/22/2015 09:30 PM, Cem Karan wrote: On Apr 22, 2015, at 8:53 AM, Peter Otten __pete...@web.de wrote: Another slightly more involved idea: Make the events pickleable, and save the simulator only for every 100th (for example) event. To restore the 7531th state load pickle 7500 and apply events 7501 to 7531. I was hoping to avoid doing this as I lose information. BUT, its likely that this will be the best approach regardless of what other methods I use; there is just too much data. Why would that lose any information??? It loses information if event processing isn't perfectly deterministic. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Diff between object graphs?
Hi all, I need some help. I'm working on a simple event-based simulator for my dissertation research. The simulator has state information that I want to analyze as a post-simulation step, so I currently save (pickle) the entire simulator every time an event occurs; this lets me analyze the simulation at any moment in time, and ask questions that I haven't thought of yet. The problem is that pickling this amount of data is both time-consuming and a space hog. This is true even when using bz2.open() to create a compressed file on the fly. This leaves me with two choices; first, pick the data I want to save, and second, find a way of generating diffs between object graphs. Since I don't yet know all the questions I want to ask, I don't want to throw away information prematurely, which is why I would prefer to avoid scenario 1. So that brings up possibility two; generating diffs between object graphs. I've searched around in the standard library and on pypi, but I haven't yet found a library that does what I want. Does anyone know of something that does? Basically, I want something with the following ability: Object_graph_2 - Object_graph_1 = diff_2_1 Object_graph_1 + diff_2_1 = Object_graph_2 The object graphs are already pickleable, and the diffs must be, or this won't work. I can use deepcopy to ensure the two object graphs are completely separate, so the diffing engine doesn't need to worry about that part. Anyone know of such a thing? Thanks, Cem Karan -- https://mail.python.org/mailman/listinfo/python-list
Re: Diff between object graphs?
On Wednesday, April 22, 2015 at 4:07:35 PM UTC+5:30, Cem Karan wrote: Hi all, I need some help. I'm working on a simple event-based simulator for my dissertation research. The simulator has state information that I want to analyze as a post-simulation step, so I currently save (pickle) the entire simulator every time an event occurs; this lets me analyze the simulation at any moment in time, and ask questions that I haven't thought of yet. The problem is that pickling this amount of data is both time-consuming and a space hog. This is true even when using bz2.open() to create a compressed file on the fly. No answer to your questions... But you do know that bzip is rather worse than gzip in time and not really so much better in space dont you?? http://tukaani.org/lzma/benchmarks.html -- https://mail.python.org/mailman/listinfo/python-list