Thanks again Tom. I'll likely wait for a bit more at this point. I also 
need/want to use the backup files as legitimate backup files (with 
associated restore capability) and have them surface in the Datastore Admin 
tool for operations simplicity/clarity.

I'm really trying to leverage stock tooling where backup and data 
replication is involved as we have >70 GAE applications that all need to 
perform similar processes and managing custom code across all of those for 
critical services doesn't sit well with me.

Hopefully your progress is swift!
j

On Tuesday, 9 July 2013 17:17:08 UTC-6, Tom Kaitchuck wrote:
>
> You don't have to use the datastore backup format to get data into 
> bigQuery. (You could if you wanted, the code is open source) But BigQuery 
> supports Json and CSV directly, which is often easier. 
> We actually have a guide on extracting data from Datastore and using a 
> MapReduce to load it into BigQuery here: 
> https://developers.google.com/bigquery/articles/datastoretobigquery
> This guide was specifically made because it is quite common to want to 
> either import only a subset of the data into BigQuery or to first run some 
> transform on it.
>
> Following the guide it shouldn't take much effort to get things up and 
> working. In the SVN there is already an output writer (still being tested) 
> included in the mapreduce library to write using the GCS client library, 
> which should allow you to run without using the FilesAPI at all. (You can 
> use it now, it's name starts with an "_" as we are still testing it, once 
> that work is completed it will be renamed to remove the "_")
>
> We are actively working on baking all of this in, to make things much 
> easier, but I can't commit to a specific date. If your only concern is this 
> going to happen in a timely way, then you should wait. That being said, 
> running your own MR a la the example above is a good option anyway as it 
> gives you more control over exactly what your are putting in to BigQuery 
> and the format it is in. 
>
>
>
> On Tue, Jul 9, 2013 at 6:30 AM, Jason Collins 
> <jason.a...@gmail.com<javascript:>
> > wrote:
>
>> Thanks Tom.
>>
>> The next part of my story is that we use the backup files with a BigQuery 
>> ingestion job - that is, the BigQuery ingestion job uses the native output 
>> from the Datastore Admin Scheduled Backups stored on Cloud Storage. 
>> ('sourceFormat': 'DATASTORE_BACKUP')
>>
>> Presumably, I'd also have to replicate the format if I were to roll my 
>> own GCS/MapReduce hybrid and continue to use the same BigQuery ingestion 
>> approach.
>>
>> Any suggestions on that front? Or maybe just an approximate ETA and save 
>> me a bunch of work? ;)
>>
>> j
>>
>>
>> On Monday, 8 July 2013 18:23:38 UTC-6, Tom Kaitchuck wrote:
>>
>>> This is something we are working hard on. We're updating many code paths 
>>> to fix a lot of issues and migrate over to the GCS client. Changes won't 
>>> roll out in one big release, rather updates will be released as they are 
>>> completed. If you don't want to wait, it is absolutely supported to use the 
>>> GCS client to write out data from within your own MapReduce. 
>>>
>>>
>>> On Thu, Jul 4, 2013 at 9:01 AM, Jason Collins <jason.a...@gmail.com>wrote:
>>>
>>>> Our backups are really flakey and take a long time, apparently because 
>>>> of the sketchy Files API link.
>>>>
>>>> Will the GAE 1.8.2 release of Datastore Admin Scheduled Backup tool 
>>>> support the new Cloud Storage Client Library?
>>>>
>>>> This is all wrapped up in MapReduce/Pipelines, so there are a lot of 
>>>> moving parts. My goal is to simply use stock, Google-supplied tools and 
>>>> have this stuff be resilient and predictable.
>>>>
>>>> Any info is appreciated. If something in this area is not imminent, we 
>>>> will have to start looking for alternate ways to achieve reliable backups.
>>>>
>>>> Thanks,
>>>> j
>>>>  
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Google App Engine" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to google-appengi...@**googlegroups.com.
>>>> To post to this group, send email to google-a...@googlegroups.**com.
>>>>
>>>> Visit this group at 
>>>> http://groups.google.com/**group/google-appengine<http://groups.google.com/group/google-appengine>
>>>> .
>>>> For more options, visit 
>>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>>>> .
>>>>  
>>>>  
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Google App Engine" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to google-appengi...@googlegroups.com <javascript:>.
>> To post to this group, send email to 
>> google-a...@googlegroups.com<javascript:>
>> .
>> Visit this group at http://groups.google.com/group/google-appengine.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to