Re: How to write temporary data to file?

2007-01-09 Thread Thomas Ploch
Thomas Ploch schrieb:
> Laszlo Nagy schrieb:
>> Thomas Ploch írta:
>>> Hi folks,
>>>
>>> I have a data structure that looks like this:
>>>
>>> d = {
>>> 'url1': {
>>> 'emails': ['a', 'b', 'c',...],
>>> 'matches': ['d', 'e', 'f',...]
>>> },
>>> 'url2': {...
>>> }
>>>
>>> This dictionary will get _very_ big, so I want to write it somehow to a
>>> file after it has grown to a certain size.
>>>
>>> How would I achieve that?
>>>   
>> How about dbm/gdbm? Since urls are strings, you can store this dict in a
>> database instance and actually use it from your program as it were a dict?
>>
>>   Laszlo
>>
> 
> Well, but how do I save the nested dict values? I don't want to eval
> them, so this is no option for me.
> 
> Thomas

I just saw shelve is the module to go for.

Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to write temporary data to file?

2007-01-09 Thread Thomas Ploch
Laszlo Nagy schrieb:
> Thomas Ploch írta:
>> Hi folks,
>>
>> I have a data structure that looks like this:
>>
>> d = {
>> 'url1': {
>> 'emails': ['a', 'b', 'c',...],
>> 'matches': ['d', 'e', 'f',...]
>> },
>> 'url2': {...
>> }
>>
>> This dictionary will get _very_ big, so I want to write it somehow to a
>> file after it has grown to a certain size.
>>
>> How would I achieve that?
>>   
> How about dbm/gdbm? Since urls are strings, you can store this dict in a
> database instance and actually use it from your program as it were a dict?
> 
>   Laszlo
> 

Well, but how do I save the nested dict values? I don't want to eval
them, so this is no option for me.

Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to write temporary data to file?

2007-01-09 Thread Laszlo Nagy
Thomas Ploch írta:
> Hi folks,
>
> I have a data structure that looks like this:
>
> d = {
>   'url1': {
>   'emails': ['a', 'b', 'c',...],
>   'matches': ['d', 'e', 'f',...]
>   },
>   'url2': {...
> }
>
> This dictionary will get _very_ big, so I want to write it somehow to a
> file after it has grown to a certain size.
>
> How would I achieve that?
>   
How about dbm/gdbm? Since urls are strings, you can store this dict in a 
database instance and actually use it from your program as it were a dict?

   Laszlo

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to write temporary data to file?

2007-01-09 Thread Marc 'BlackJack' Rintsch
In <[EMAIL PROTECTED]>, Thomas Ploch
wrote:

> d = {
>   'url1': {
>   'emails': ['a', 'b', 'c',...],
>   'matches': ['d', 'e', 'f',...]
>   },
>   'url2': {...
> }
> 
> This dictionary will get _very_ big, so I want to write it somehow to a
> file after it has grown to a certain size.
> 
> How would I achieve that?

If you want easy access to single 'url' keys then `shelve` might be an
alternative to pickling the whole thing as one big object.

Ciao,
Marc 'BlackJack' Rintsch
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to write temporary data to file?

2007-01-08 Thread Thomas Ploch
Ravi Teja schrieb:
> Thomas Ploch wrote:
>> Ravi Teja schrieb:
>>> Thomas Ploch wrote:
 Hi folks,

 I have a data structure that looks like this:

 d = {
'url1': {
'emails': ['a', 'b', 'c',...],
'matches': ['d', 'e', 'f',...]
},
'url2': {...
 }

 This dictionary will get _very_ big, so I want to write it somehow to a
 file after it has grown to a certain size.

 How would I achieve that?

 Thanks,
 Thomas
>>> Pickle/cPickle are standard library modules that can persist data.
>>> But in this case, I would recommend ZODB/Durus.
>>>
>>> (Your code example scares me. I hope you have benevolent purposes for
>>> that application.)
>>>
>>> Ravi Teja.
>>>
>> Thanks, but why is this code example scaring you?
>>
>> Thomas
> 
> The code indicates that you are trying to harvest a _very_ (as you put
> it) large set of email addresses from web pages. With my limited
> imagination, I can think of only one group of people who would need to
> do that. But considering that you write good English, you must not be
> one of those mean people that needed me to get a new email account just
> for posting to Usenet :-).
> 
> Ravi Teja.
> 

Oh, well, yes you are right that this application is able to harvest
email addresses. But it can do much more than that. It has a text
matching engine, that according to given meta keywords can scan or not
scan documents in the web and harvest all kinds of information. It can
also be fed with callbacks for each of the Content-Types. I know that
the email matching engine is a kind of a 'grey zone', and I asked
myself, if it needs the email stuff. But I mean you could easily include
the email regex to the text matching engine yourself, so I decided to
add this functionality (it is 'OFF' by default :-) ).

Thomas

P.S.: No, I am a good person.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to write temporary data to file?

2007-01-08 Thread Ravi Teja

Thomas Ploch wrote:
> Ravi Teja schrieb:
> > Thomas Ploch wrote:
> >> Hi folks,
> >>
> >> I have a data structure that looks like this:
> >>
> >> d = {
> >>'url1': {
> >>'emails': ['a', 'b', 'c',...],
> >>'matches': ['d', 'e', 'f',...]
> >>},
> >>'url2': {...
> >> }
> >>
> >> This dictionary will get _very_ big, so I want to write it somehow to a
> >> file after it has grown to a certain size.
> >>
> >> How would I achieve that?
> >>
> >> Thanks,
> >> Thomas
> >
> > Pickle/cPickle are standard library modules that can persist data.
> > But in this case, I would recommend ZODB/Durus.
> >
> > (Your code example scares me. I hope you have benevolent purposes for
> > that application.)
> >
> > Ravi Teja.
> >
>
> Thanks, but why is this code example scaring you?
>
> Thomas

The code indicates that you are trying to harvest a _very_ (as you put
it) large set of email addresses from web pages. With my limited
imagination, I can think of only one group of people who would need to
do that. But considering that you write good English, you must not be
one of those mean people that needed me to get a new email account just
for posting to Usenet :-).

Ravi Teja.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to write temporary data to file?

2007-01-08 Thread Thomas Ploch
Ravi Teja schrieb:
> Thomas Ploch wrote:
>> Hi folks,
>>
>> I have a data structure that looks like this:
>>
>> d = {
>>  'url1': {
>>  'emails': ['a', 'b', 'c',...],
>>  'matches': ['d', 'e', 'f',...]
>>  },
>>  'url2': {...
>> }
>>
>> This dictionary will get _very_ big, so I want to write it somehow to a
>> file after it has grown to a certain size.
>>
>> How would I achieve that?
>>
>> Thanks,
>> Thomas
> 
> Pickle/cPickle are standard library modules that can persist data.
> But in this case, I would recommend ZODB/Durus.
> 
> (Your code example scares me. I hope you have benevolent purposes for
> that application.)
> 
> Ravi Teja.
> 

Thanks, but why is this code example scaring you?

Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to write temporary data to file?

2007-01-08 Thread Ravi Teja

Thomas Ploch wrote:
> Hi folks,
>
> I have a data structure that looks like this:
>
> d = {
>   'url1': {
>   'emails': ['a', 'b', 'c',...],
>   'matches': ['d', 'e', 'f',...]
>   },
>   'url2': {...
> }
>
> This dictionary will get _very_ big, so I want to write it somehow to a
> file after it has grown to a certain size.
>
> How would I achieve that?
>
> Thanks,
> Thomas

Pickle/cPickle are standard library modules that can persist data.
But in this case, I would recommend ZODB/Durus.

(Your code example scares me. I hope you have benevolent purposes for
that application.)

Ravi Teja.

-- 
http://mail.python.org/mailman/listinfo/python-list


How to write temporary data to file?

2007-01-08 Thread Thomas Ploch
Hi folks,

I have a data structure that looks like this:

d = {
'url1': {
'emails': ['a', 'b', 'c',...],
'matches': ['d', 'e', 'f',...]
},
'url2': {...
}

This dictionary will get _very_ big, so I want to write it somehow to a
file after it has grown to a certain size.

How would I achieve that?

Thanks,
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list