On 5/12/2013 1:54 AM, Peter Palfrader wrote:
> I wonder if we can somehow, somewhere tag files that we got from non-US
> or archive.d.o (which also covers non-US) and no other tree. Somebody
> would have to write code for that.
Do you have the source of where the files originated from somewhere? We
can add arbitrary metadata as headers to objects if we want to. That
metadata tag gets served back as headers when you get/head the object
(file).
>>> I wonder if we can replicate to a running postgres instance. If not, we
>>> might have to feed it individually, importing the dumps that the master
>>> produces. Thoughts?
>> A dump from the current master would be a good start. What size are they
>> (is it the 2.1 GB file I saw in there)? Peter, would you like the
>> credentials for this DB (also in US-East right now)? If so, can you give
>> me an IPv4 you'll be accessing it from?
> I'm not sure I can make use of DB access right now, thanks. When we
> still had a mirror at UBC, we used postgresql's DB replication feature
> to keep that mirror in sync. Is that an option with this instance?
Not right now - the replication that is supported is currently wholly
within the AWS environment - Multi-AZ is the feature, synchronous block
level replication from host in one cluster of data-centers (Availability
Zone - AZ), to a standby host in the second AZ.
> OTOH, we may not necessarily need a DB at amazon. It should certainly
> be possible to seperate backend hosts from frontend from database hosts.
Absolutely - completely your choice.
See attached for an untested example to 'restore' a given file that has
been archived; this is not a working example, just an initial sketch.
The user credentials I sent you (offlist, naturally) for data ingest to
S3 does not have access to call this restore at this time; we'll do a
separate user with only access to restore (and not ingest) that can be
in this script - but the concept is easy:
* We'll default to restoring a file to live storage for 14 days (after
which the duplicate in lice storage is automatically removed)
* We'll limit to doing 100 file restores per day from archive
Feel free to edit parameters as above - but hopefully it shows you how
this plugs together.
James
(Going to bed now - 12:30am now here at AWST+0800)
--
/Mobile:/ +61 422 166 708, /Email:/ james_AT_rcpt.to
#!/usr/bin/python
# vi: ft=python
restore_time_days = 14
restores_per_day = 100
bucket_name = aws.snapshot.debian.org
simpledb_table = snapshot.debian.org
import datetime
import boto.sdb
from boto.s3.connection import S3Connection
s3_conn = S3Connection('<aws access key>', '<aws secret key>')
bucket = s3_conn.create_bucket(bucket_name)
from boto.s3.key import Key
sdb_conn.create_domain(simpledb_table)
def check_file_is_archived(file_name):
if bucket.list(prefix=file_name).storage_class = "GLACIER"
return 1
return 0
def check_file_already_being_restored(file_name);
if bucket.get_key(file_name).ongoing_restore
return 1
return 0
def check_daily_restores():
today = datetime.datetime.today()
tomorrow = datetime.date.today() + datetime.timedelta(days=1)
sdb_domain = sdb_conn.get_domain(simpledb_table)
todays_sdb_data = sdb_domain.get_item(today.day)
return True if todays_sbd_data.count < restores_per_day
def restore_file(file_name, days):
key = bucket.get_key(file_name)
key.restore(days)
def update_daily_restores():
today = datetime.datetime.today()
sdb_domain = sdb_conn.get_domain(simpledb_table)
todays_sdb_data = sdb_domain.get_item(today.day)
if todays_sdb_data.today != today
todays_sdb_data.count=1
todays_sdb_data.today = today
else
todays_sdb_data.count++
todays_sdb_data.save()
def process(file_name):
if (not check_file_is_archived(file_name))
return "File is not achived."
else if (check_file_already_being_restored(file_name))
return "File is already being restored - please wait 3 - 5 hours from the
inital restore"
else if not check_daily_restores
return "Too many restores have been done today - come back tomorrow"
else
restore_file(file_name, restore_time_days)
update_daily_restores()
return "Your file has been scheduled for restore - please try and access it
in 3 - 5 hours"
def main():
# If this is web accessible, probably drop some HTML in here...
print process()