On Thu, Aug 22, 2019 at 6:16 PM ALT-EMAIL Virilo Tejedor wrote:

> thanks Barry,
>
> I'm not very sure if paralellizing could work, because it has a delay of
> seconds to get a single image from this bucket.
>

If an single image has a latecny of say 2 seconds, and 2 seconds to
download. so 4 seconds per image. Say 15/minute.

But if download say 10 in parallel, then thats 150/minute. Even with the
latency per image.
... overall the 'latency' gets spread around, So your script wouldn't be
waiting for all 10 at the same time. so will waiting for one to start, it
can be downloading a differnt one.

AWS does not store all 10 million images on the *same physical disk*. Its
possibly spread over *millions *of disks.



>
> I should open several threads, and probably I'm going to have problems
> with another limitations
>

In theory yes. But AWS will cape with high concurrency. It's designed that
way. Could easily download 1,000 images concurrently, if you had the
bandwidth.


But however you download the data (even if it just to upload elsewhere -
still dont understand why) - will have to deal with this latency to
download it all in a realistic timeframe.

Downloading them all one by might take 463 days ;)

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/CAJCAUuK3eq6EkAgcn5qznr7jeAtG9DCCHcb6zTFO_UZVFSazcA%40mail.gmail.com.

Reply via email to