You don't have to use the datastore backup format to get data into
bigQuery. (You could if you wanted, the code is open source) But BigQuery
supports Json and CSV directly, which is often easier.
We actually have a guide on extracting data from Datastore and using a
MapReduce to load it into BigQuery here:
https://developers.google.com/bigquery/articles/datastoretobigquery
This guide was specifically made because it is quite common to want to
either import only a subset of the data into BigQuery or to first run some
transform on it.

Following the guide it shouldn't take much effort to get things up and
working. In the SVN there is already an output writer (still being tested)
included in the mapreduce library to write using the GCS client library,
which should allow you to run without using the FilesAPI at all. (You can
use it now, it's name starts with an "_" as we are still testing it, once
that work is completed it will be renamed to remove the "_")

We are actively working on baking all of this in, to make things much
easier, but I can't commit to a specific date. If your only concern is this
going to happen in a timely way, then you should wait. That being said,
running your own MR a la the example above is a good option anyway as it
gives you more control over exactly what your are putting in to BigQuery
and the format it is in.



On Tue, Jul 9, 2013 at 6:30 AM, Jason Collins <jason.a.coll...@gmail.com>wrote:

> Thanks Tom.
>
> The next part of my story is that we use the backup files with a BigQuery
> ingestion job - that is, the BigQuery ingestion job uses the native output
> from the Datastore Admin Scheduled Backups stored on Cloud Storage.
> ('sourceFormat': 'DATASTORE_BACKUP')
>
> Presumably, I'd also have to replicate the format if I were to roll my own
> GCS/MapReduce hybrid and continue to use the same BigQuery ingestion
> approach.
>
> Any suggestions on that front? Or maybe just an approximate ETA and save
> me a bunch of work? ;)
>
> j
>
>
> On Monday, 8 July 2013 18:23:38 UTC-6, Tom Kaitchuck wrote:
>
>> This is something we are working hard on. We're updating many code paths
>> to fix a lot of issues and migrate over to the GCS client. Changes won't
>> roll out in one big release, rather updates will be released as they are
>> completed. If you don't want to wait, it is absolutely supported to use the
>> GCS client to write out data from within your own MapReduce.
>>
>>
>> On Thu, Jul 4, 2013 at 9:01 AM, Jason Collins <jason.a...@gmail.com>wrote:
>>
>>> Our backups are really flakey and take a long time, apparently because
>>> of the sketchy Files API link.
>>>
>>> Will the GAE 1.8.2 release of Datastore Admin Scheduled Backup tool
>>> support the new Cloud Storage Client Library?
>>>
>>> This is all wrapped up in MapReduce/Pipelines, so there are a lot of
>>> moving parts. My goal is to simply use stock, Google-supplied tools and
>>> have this stuff be resilient and predictable.
>>>
>>> Any info is appreciated. If something in this area is not imminent, we
>>> will have to start looking for alternate ways to achieve reliable backups.
>>>
>>> Thanks,
>>> j
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Google App Engine" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to google-appengi...@**googlegroups.com.
>>> To post to this group, send email to google-a...@googlegroups.**com.
>>>
>>> Visit this group at 
>>> http://groups.google.com/**group/google-appengine<http://groups.google.com/group/google-appengine>
>>> .
>>> For more options, visit 
>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>>> .
>>>
>>>
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to google-appengine+unsubscr...@googlegroups.com.
> To post to this group, send email to google-appengine@googlegroups.com.
> Visit this group at http://groups.google.com/group/google-appengine.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to