GitHub user mahendra opened a pull request:

    https://github.com/apache/libcloud/pull/75

    LIBCLOUD-254 : Generator based iteration instead of LazyList

    ```LazyList``` was implemented by issue (LIBCLOUD-78) for more efficient 
iteration over objects stored in a container (S3, CloudFiles etc. limit the 
maximum number of objects returned in a single call). 
    
    ```LazyList``` solved this problem, but I think it might have the following 
issues while handling containers with large number of objects 
    1. It loads the entire list to memory 
    2. caller has to wait for the entire list to be loaded in memory before any 
operation can be done 
    3. The api invocation using ```get_more()``` and value_list is a bit 
complex (as-in, it can be simplified) 
    
    By using python generators, the above problems can be alleviated. Results 
can be returned to the caller as and when it is returned from the server. 
    
    The following changes were done to the libcloud apis 
    * A new api called - ```iterate_container_objects()``` was introduced. The 
storage drivers need to implement this instead of 
```list_container_objects()```. This API now returns a generator. Usage of this 
API will alleviate the above three problems. 
    * ```list_container_objects()``` will simply do - 
```list(self.iterate_container_objects(container))``` - this is maintained for 
backwards compatibility. It would be better if users can start using 
```iterate_**()``` api instead. 
    * The same changes have been made for the DNS base class also. 
    * ```LazyList()``` can be removed from libcloud if it is OK with everyone. 
    * The generator based interface can be used (WIP) for providing paginated 
access to objects - This will be useful for webpages/apps where the user has to 
paginate through the results. The same can be implemented by providing 
```start_key``` and ```count``` parameters (similar to CouchDB) instead of 
generating the entire list and then doing an offset. This will be more 
performance/memory efficient than generating the entire list for every request. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mahendra/libcloud lazy

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/libcloud/pull/75.patch

----
commit 1908c0319d1a13edddc3ec689093e42bd93c0acf
Author: Mahendra M <[email protected]>
Date:   2012-11-07T11:26:11Z

    LIBCLOUD-254 : Generator based iteration instead of LazyList

----

Reply via email to