[ 
https://issues.apache.org/jira/browse/LIBCLOUD-254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahendra M updated LIBCLOUD-254:
--------------------------------

    Description: 
LazyList was implemented by issue (LIBCLOUD-78) for more efficient iteration 
over objects stored in a container (S3, CloudFiles etc. limit the maximum 
number of objects returned in a single call).

LazyList solved this problem, but I think it might have the following issues 
while handling containers with large number of objects
1) It loads the entire list to memory
2) caller has to wait for the entire list to be loaded in memory before any 
operation can be done
3) The api invocation using get_more() and value_list is a bit complex (as-in, 
it can be simplified)

By using python generators, the above problems can be alleviated. Results can 
be returned to the caller as and when it is returned from the server.

The following changes were done to the libcloud apis
1) A new api called - iterate_container_objects() was introduced. The storage 
drivers need to implement this instead of list_container_objects(). This API 
now returns a generator. Usage of this API will alleviate the above three 
problems.
2) list_container_objects() will simply do - 
list(self.iterate_container_objects(container)) - this is maintained for 
backwards compatibility. It would be better if users can start using 
iterate_**() api instead.
3) The same changes have been made for the DNS base class also.
4) LazyList() can be removed from libcloud if it is OK with everyone.
5) The generator based interface can be used (WIP) for providing paginated 
access to objects - This will be useful for webpages/apps where the user has to 
paginate through the results. The same can be implemented by providing 
"start_key" and "count" parameters (similar to CouchDB) instead of generating 
the entire list and then doing an offset. This will be more performance/memory 
efficient than generating the entire list for every request.


  was:
LazyList was implemented by issue (LIBCLOUD-78) for more efficient iteration 
over objects stored in a container (S3, CloudFiles etc. limit the maximum 
number of objects returned in a single call).

LazyList solved this problem, but I think it might have the following issues 
while handling containers with large number of objects
1) It loads the entire list to memory
2) caller has to wait for the entire list to be loaded in memory before any 
operation can be done
3) The api invocation using get_more() and value_list is a bit complex (as-in, 
it can be simplified)

By using python generators, the above two problems can be alleviated. Results 
can be returned to the caller as and when it is returned from the server.

The following changes were done to the libcloud apis
1) A new api called - iterate_container_objects() was introduced. The storage 
drivers need to implement this instead of list_container_objects(). This API 
now returns a generator. Usage of this API will alleviate the above three 
problems.
2) list_container_objects() will simply do - 
list(self.iterate_container_objects(container)) - this is maintained for 
backwards compatibility. It would be better if users can start using 
iterate_**() api instead.
3) The same changes have been made for the DNS base class also.
4) LazyList() can be removed from libcloud if it is OK with everyone.
5) The generator based interface can be used (WIP) for providing paginated 
access to objects - This will be useful for webpages/apps where the user has to 
paginate through the results. The same can be implemented by providing 
"start_key" and "count" parameters (similar to CouchDB) instead of generating 
the entire list and then doing an offset. This will be more performance/memory 
efficient than generating the entire list for every request.


    
> Provide generator based iteration instead of LazyList
> -----------------------------------------------------
>
>                 Key: LIBCLOUD-254
>                 URL: https://issues.apache.org/jira/browse/LIBCLOUD-254
>             Project: Libcloud
>          Issue Type: Improvement
>          Components: Core, Storage
>            Reporter: Mahendra M
>            Priority: Minor
>
> LazyList was implemented by issue (LIBCLOUD-78) for more efficient iteration 
> over objects stored in a container (S3, CloudFiles etc. limit the maximum 
> number of objects returned in a single call).
> LazyList solved this problem, but I think it might have the following issues 
> while handling containers with large number of objects
> 1) It loads the entire list to memory
> 2) caller has to wait for the entire list to be loaded in memory before any 
> operation can be done
> 3) The api invocation using get_more() and value_list is a bit complex 
> (as-in, it can be simplified)
> By using python generators, the above problems can be alleviated. Results can 
> be returned to the caller as and when it is returned from the server.
> The following changes were done to the libcloud apis
> 1) A new api called - iterate_container_objects() was introduced. The storage 
> drivers need to implement this instead of list_container_objects(). This API 
> now returns a generator. Usage of this API will alleviate the above three 
> problems.
> 2) list_container_objects() will simply do - 
> list(self.iterate_container_objects(container)) - this is maintained for 
> backwards compatibility. It would be better if users can start using 
> iterate_**() api instead.
> 3) The same changes have been made for the DNS base class also.
> 4) LazyList() can be removed from libcloud if it is OK with everyone.
> 5) The generator based interface can be used (WIP) for providing paginated 
> access to objects - This will be useful for webpages/apps where the user has 
> to paginate through the results. The same can be implemented by providing 
> "start_key" and "count" parameters (similar to CouchDB) instead of generating 
> the entire list and then doing an offset. This will be more 
> performance/memory efficient than generating the entire list for every 
> request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to