[appengine-java] Re: Bulk writes to datastore

2009-09-04 Thread Jason (Google)
Batch puts are supported, yes, and as of yesterday's release, calling
makePersistentAll (JDO) and the equivalent JPA call will take advantage of
this support (previously, you had to use the low-level API).

Two quick notes:

1) All of the entities that you're persisting should be in separate entity
groups since two entities in the same entity group can't be written to
consecutively, and you will see datastore timeout exceptions if many
simultaneous write requests come in for the same entity or entity group.
2) Batch puts do not operate in a transaction. This means that some writes
may succeed but others may not, so if you need the ability to rollback,
you'll need transactions.

- Jason

Let me know if you have any more questions on this.

- Jason

On Thu, Sep 3, 2009 at 7:24 PM, Nicholas Albion  wrote:

>
> Is it possible to overcome the datastore's 10 writes/second limit by
> batching them?
>
> I've got a table containing just over one million records (in CSV
> format).  Does a batched write (of around 1MB of data and, say 1000
> records) count as one write, or 1000 writes?
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-05 Thread Vince Bonfanti

Your two "quick notes" seem to be contradictory. In order to use
transactions, don't all of the entities have to be in the same entity
group?

Vince

On Fri, Sep 4, 2009 at 8:24 PM, Jason (Google) wrote:
> Batch puts are supported, yes, and as of yesterday's release, calling
> makePersistentAll (JDO) and the equivalent JPA call will take advantage of
> this support (previously, you had to use the low-level API).
>
> Two quick notes:
>
> 1) All of the entities that you're persisting should be in separate entity
> groups since two entities in the same entity group can't be written to
> consecutively, and you will see datastore timeout exceptions if many
> simultaneous write requests come in for the same entity or entity group.
> 2) Batch puts do not operate in a transaction. This means that some writes
> may succeed but others may not, so if you need the ability to rollback,
> you'll need transactions.
>
> - Jason
>
> Let me know if you have any more questions on this.
>
> - Jason
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-06 Thread Nicholas Albion

On Sep 5, 10:24 am, "Jason (Google)"  wrote:
> Batch puts are supported, yes, and as of yesterday's release, calling
> makePersistentAll (JDO) and the equivalent JPA call will take advantage of
> this support (previously, you had to use the low-level API).
>
> Two quick notes:
>
> 1) All of the entities that you're persisting should be in separate entity
> groups since two entities in the same entity group can't be written to
> consecutively, and you will see datastore timeout exceptions if many
> simultaneous write requests come in for the same entity or entity group.

Sorry Jason, I'm a bit confused now.  Wouldn't that be the most common
use case for batch puts?  According to the GAE documentation, this is
the main point of entity groups:
  "App Engine creates related entities in entity groups automatically
to support updating related objects together"

...so you can add them together _logically_ but not chronologically?
I've got several cases where I'd have 50,000 to 1000,000 records which
logically belong to a single parent entity.  If I need to add them to
the datastore individually it's going to take about somewhere between
2 to 24 hours to write them all (spread across multiple HTTP requests
in any case).  If I could batch put the data (within the same entity
group) I imagine that the time would be reduced significantly.

> 2) Batch puts do not operate in a transaction. This means that some writes
> may succeed but others may not, so if you need the ability to rollback,
> you'll need transactions.

Do you mean that if necessary, the call to makePersistentAll() should
be wrapped in a transaction, or that makePersistentAll() _can_not_ be
wrapped in a transaction?
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-08 Thread Jason (Google)
Yes. If you need to be able to rollback in case one or more entities don't
get written, you'll need to use transactions. If you use transactions, your
entities must belong to the same entity group or else an exception will be
thrown. You'll get better performance if you do this outside of a
transaction since all entities can be written in parallel but you'll lose
the ability to roll back in case of an individual failure.

- Jason

On Sat, Sep 5, 2009 at 7:18 AM, Vince Bonfanti  wrote:

>
> Your two "quick notes" seem to be contradictory. In order to use
> transactions, don't all of the entities have to be in the same entity
> group?
>
> Vince
>
> On Fri, Sep 4, 2009 at 8:24 PM, Jason (Google) wrote:
> > Batch puts are supported, yes, and as of yesterday's release, calling
> > makePersistentAll (JDO) and the equivalent JPA call will take advantage
> of
> > this support (previously, you had to use the low-level API).
> >
> > Two quick notes:
> >
> > 1) All of the entities that you're persisting should be in separate
> entity
> > groups since two entities in the same entity group can't be written to
> > consecutively, and you will see datastore timeout exceptions if many
> > simultaneous write requests come in for the same entity or entity group.
> > 2) Batch puts do not operate in a transaction. This means that some
> writes
> > may succeed but others may not, so if you need the ability to rollback,
> > you'll need transactions.
> >
> > - Jason
> >
> > Let me know if you have any more questions on this.
> >
> > - Jason
> >
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-08 Thread Jason (Google)
If you're trying to achieve high write throughput, as it sounds like you are
since you have 1,000,000 entities to write, you should be designing your
schema to minimize the number of entities in an entity group. These and
other general tips are listed here:

http://code.google.com/appengine/docs/python/datastore/keysandentitygroups.html#Entity_Groups_Ancestors_and_Paths

Putting all of your entities in a single group significantly impairs your
application's ability to update entities since entities can no longer be
written in parallel. Large entity groups will work fine if your entities
aren't being updated very often (generally 1-10 per second, max), but if you
want to do massive bulk writes like this, I suggest re-thinking your design
with this in mind. Even if you can't rollback the entire write, doing a
batch put of entities in separate entity groups should thrown an exception
in the event of a failure which you can catch and re-try the write for the
single affected entity.

- Jason

On Sun, Sep 6, 2009 at 5:12 PM, Nicholas Albion  wrote:

>
> On Sep 5, 10:24 am, "Jason (Google)"  wrote:
> > Batch puts are supported, yes, and as of yesterday's release, calling
> > makePersistentAll (JDO) and the equivalent JPA call will take advantage
> of
> > this support (previously, you had to use the low-level API).
> >
> > Two quick notes:
> >
> > 1) All of the entities that you're persisting should be in separate
> entity
> > groups since two entities in the same entity group can't be written to
> > consecutively, and you will see datastore timeout exceptions if many
> > simultaneous write requests come in for the same entity or entity group.
>
> Sorry Jason, I'm a bit confused now.  Wouldn't that be the most common
> use case for batch puts?  According to the GAE documentation, this is
> the main point of entity groups:
>  "App Engine creates related entities in entity groups automatically
> to support updating related objects together"
>
> ...so you can add them together _logically_ but not chronologically?
> I've got several cases where I'd have 50,000 to 1000,000 records which
> logically belong to a single parent entity.  If I need to add them to
> the datastore individually it's going to take about somewhere between
> 2 to 24 hours to write them all (spread across multiple HTTP requests
> in any case).  If I could batch put the data (within the same entity
> group) I imagine that the time would be reduced significantly.
>
> > 2) Batch puts do not operate in a transaction. This means that some
> writes
> > may succeed but others may not, so if you need the ability to rollback,
> > you'll need transactions.
>
> Do you mean that if necessary, the call to makePersistentAll() should
> be wrapped in a transaction, or that makePersistentAll() _can_not_ be
> wrapped in a transaction?
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-08 Thread Larry Cable

any documentation or comments on how JPA/JDO map their entities and
identities onto entity groups?


On Sep 8, 2:16 pm, "Jason (Google)"  wrote:
> If you're trying to achieve high write throughput, as it sounds like you are
> since you have 1,000,000 entities to write, you should be designing your
> schema to minimize the number of entities in an entity group. These and
> other general tips are listed here:
>
> http://code.google.com/appengine/docs/python/datastore/keysandentityg...
>
> Putting all of your entities in a single group significantly impairs your
> application's ability to update entities since entities can no longer be
> written in parallel. Large entity groups will work fine if your entities
> aren't being updated very often (generally 1-10 per second, max), but if you
> want to do massive bulk writes like this, I suggest re-thinking your design
> with this in mind. Even if you can't rollback the entire write, doing a
> batch put of entities in separate entity groups should thrown an exception
> in the event of a failure which you can catch and re-try the write for the
> single affected entity.
>
> - Jason
>
>
>
> On Sun, Sep 6, 2009 at 5:12 PM, Nicholas Albion  wrote:
>
> > On Sep 5, 10:24 am, "Jason (Google)"  wrote:
> > > Batch puts are supported, yes, and as of yesterday's release, calling
> > > makePersistentAll (JDO) and the equivalent JPA call will take advantage
> > of
> > > this support (previously, you had to use the low-level API).
>
> > > Two quick notes:
>
> > > 1) All of the entities that you're persisting should be in separate
> > entity
> > > groups since two entities in the same entity group can't be written to
> > > consecutively, and you will see datastore timeout exceptions if many
> > > simultaneous write requests come in for the same entity or entity group.
>
> > Sorry Jason, I'm a bit confused now.  Wouldn't that be the most common
> > use case for batch puts?  According to the GAE documentation, this is
> > the main point of entity groups:
> >  "App Engine creates related entities in entity groups automatically
> > to support updating related objects together"
>
> > ...so you can add them together _logically_ but not chronologically?
> > I've got several cases where I'd have 50,000 to 1000,000 records which
> > logically belong to a single parent entity.  If I need to add them to
> > the datastore individually it's going to take about somewhere between
> > 2 to 24 hours to write them all (spread across multiple HTTP requests
> > in any case).  If I could batch put the data (within the same entity
> > group) I imagine that the time would be reduced significantly.
>
> > > 2) Batch puts do not operate in a transaction. This means that some
> > writes
> > > may succeed but others may not, so if you need the ability to rollback,
> > > you'll need transactions.
>
> > Do you mean that if necessary, the call to makePersistentAll() should
> > be wrapped in a transaction, or that makePersistentAll() _can_not_ be
> > wrapped in a transaction?- Hide quoted text -
>
> - Show quoted text -
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-10 Thread Jason (Google)
All standalone entities are in their own entity group by default. To put an
entity in another entity, you use an owned relationship, and we have a
section in our docs for that:

http://code.google.com/appengine/docs/java/datastore/relationships.html#Relationships_Entity_Groups_and_Transactions

- Jason

On Tue, Sep 8, 2009 at 8:18 PM, Larry Cable  wrote:

>
> any documentation or comments on how JPA/JDO map their entities and
> identities onto entity groups?
>
>
> On Sep 8, 2:16 pm, "Jason (Google)"  wrote:
> > If you're trying to achieve high write throughput, as it sounds like you
> are
> > since you have 1,000,000 entities to write, you should be designing your
> > schema to minimize the number of entities in an entity group. These and
> > other general tips are listed here:
> >
> > http://code.google.com/appengine/docs/python/datastore/keysandentityg...
> >
> > Putting all of your entities in a single group significantly impairs your
> > application's ability to update entities since entities can no longer be
> > written in parallel. Large entity groups will work fine if your entities
> > aren't being updated very often (generally 1-10 per second, max), but if
> you
> > want to do massive bulk writes like this, I suggest re-thinking your
> design
> > with this in mind. Even if you can't rollback the entire write, doing a
> > batch put of entities in separate entity groups should thrown an
> exception
> > in the event of a failure which you can catch and re-try the write for
> the
> > single affected entity.
> >
> > - Jason
> >
> >
> >
> > On Sun, Sep 6, 2009 at 5:12 PM, Nicholas Albion 
> wrote:
> >
> > > On Sep 5, 10:24 am, "Jason (Google)"  wrote:
> > > > Batch puts are supported, yes, and as of yesterday's release, calling
> > > > makePersistentAll (JDO) and the equivalent JPA call will take
> advantage
> > > of
> > > > this support (previously, you had to use the low-level API).
> >
> > > > Two quick notes:
> >
> > > > 1) All of the entities that you're persisting should be in separate
> > > entity
> > > > groups since two entities in the same entity group can't be written
> to
> > > > consecutively, and you will see datastore timeout exceptions if many
> > > > simultaneous write requests come in for the same entity or entity
> group.
> >
> > > Sorry Jason, I'm a bit confused now.  Wouldn't that be the most common
> > > use case for batch puts?  According to the GAE documentation, this is
> > > the main point of entity groups:
> > >  "App Engine creates related entities in entity groups automatically
> > > to support updating related objects together"
> >
> > > ...so you can add them together _logically_ but not chronologically?
> > > I've got several cases where I'd have 50,000 to 1000,000 records which
> > > logically belong to a single parent entity.  If I need to add them to
> > > the datastore individually it's going to take about somewhere between
> > > 2 to 24 hours to write them all (spread across multiple HTTP requests
> > > in any case).  If I could batch put the data (within the same entity
> > > group) I imagine that the time would be reduced significantly.
> >
> > > > 2) Batch puts do not operate in a transaction. This means that some
> > > writes
> > > > may succeed but others may not, so if you need the ability to
> rollback,
> > > > you'll need transactions.
> >
> > > Do you mean that if necessary, the call to makePersistentAll() should
> > > be wrapped in a transaction, or that makePersistentAll() _can_not_ be
> > > wrapped in a transaction?- Hide quoted text -
> >
> > - Show quoted text -
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-11 Thread Larry Cable

thanks Jason

On Sep 10, 2:00 pm, "Jason (Google)"  wrote:
> All standalone entities are in their own entity group by default. To put an
> entity in another entity, you use an owned relationship, and we have a
> section in our docs for that:
>
> http://code.google.com/appengine/docs/java/datastore/relationships.ht...
>
> - Jason
>
>
>
> On Tue, Sep 8, 2009 at 8:18 PM, Larry Cable  wrote:
>
> > any documentation or comments on how JPA/JDO map their entities and
> > identities onto entity groups?
>
> > On Sep 8, 2:16 pm, "Jason (Google)"  wrote:
> > > If you're trying to achieve high write throughput, as it sounds like you
> > are
> > > since you have 1,000,000 entities to write, you should be designing your
> > > schema to minimize the number of entities in an entity group. These and
> > > other general tips are listed here:
>
> > >http://code.google.com/appengine/docs/python/datastore/keysandentityg...
>
> > > Putting all of your entities in a single group significantly impairs your
> > > application's ability to update entities since entities can no longer be
> > > written in parallel. Large entity groups will work fine if your entities
> > > aren't being updated very often (generally 1-10 per second, max), but if
> > you
> > > want to do massive bulk writes like this, I suggest re-thinking your
> > design
> > > with this in mind. Even if you can't rollback the entire write, doing a
> > > batch put of entities in separate entity groups should thrown an
> > exception
> > > in the event of a failure which you can catch and re-try the write for
> > the
> > > single affected entity.
>
> > > - Jason
>
> > > On Sun, Sep 6, 2009 at 5:12 PM, Nicholas Albion 
> > wrote:
>
> > > > On Sep 5, 10:24 am, "Jason (Google)"  wrote:
> > > > > Batch puts are supported, yes, and as of yesterday's release, calling
> > > > > makePersistentAll (JDO) and the equivalent JPA call will take
> > advantage
> > > > of
> > > > > this support (previously, you had to use the low-level API).
>
> > > > > Two quick notes:
>
> > > > > 1) All of the entities that you're persisting should be in separate
> > > > entity
> > > > > groups since two entities in the same entity group can't be written
> > to
> > > > > consecutively, and you will see datastore timeout exceptions if many
> > > > > simultaneous write requests come in for the same entity or entity
> > group.
>
> > > > Sorry Jason, I'm a bit confused now.  Wouldn't that be the most common
> > > > use case for batch puts?  According to the GAE documentation, this is
> > > > the main point of entity groups:
> > > >  "App Engine creates related entities in entity groups automatically
> > > > to support updating related objects together"
>
> > > > ...so you can add them together _logically_ but not chronologically?
> > > > I've got several cases where I'd have 50,000 to 1000,000 records which
> > > > logically belong to a single parent entity.  If I need to add them to
> > > > the datastore individually it's going to take about somewhere between
> > > > 2 to 24 hours to write them all (spread across multiple HTTP requests
> > > > in any case).  If I could batch put the data (within the same entity
> > > > group) I imagine that the time would be reduced significantly.
>
> > > > > 2) Batch puts do not operate in a transaction. This means that some
> > > > writes
> > > > > may succeed but others may not, so if you need the ability to
> > rollback,
> > > > > you'll need transactions.
>
> > > > Do you mean that if necessary, the call to makePersistentAll() should
> > > > be wrapped in a transaction, or that makePersistentAll() _can_not_ be
> > > > wrapped in a transaction?- Hide quoted text -
>
> > > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-11 Thread Larry Cable

I tried doing a "bulk" load with the JDO makePersistentAll(..) call
yesterday ...

by default what I did was created a List of size 2048, filled it to
capacity and then called makePersistentAll() ... I got an
IllegalArgumentException out of that call stating that you could
only persist at most 500 objects per call ...

I was unable to retest this, because despite making a change to
the capacity of the the List to 500 and re-deploying the redeployment
did not seem to take effect ...

will keep trying ...

On Sep 4, 5:24 pm, "Jason (Google)"  wrote:
> Batch puts are supported, yes, and as of yesterday's release, calling
> makePersistentAll (JDO) and the equivalent JPA call will take advantage of
> this support (previously, you had to use the low-level API).
>
> Two quick notes:
>
> 1) All of the entities that you're persisting should be in separate entity
> groups since two entities in the same entity group can't be written to
> consecutively, and you will see datastore timeout exceptions if many
> simultaneous write requests come in for the same entity or entity group.
> 2) Batch puts do not operate in a transaction. This means that some writes
> may succeed but others may not, so if you need the ability to rollback,
> you'll need transactions.
>
> - Jason
>
> Let me know if you have any more questions on this.
>
> - Jason
>
>
>
> On Thu, Sep 3, 2009 at 7:24 PM, Nicholas Albion  wrote:
>
> > Is it possible to overcome the datastore's 10 writes/second limit by
> > batching them?
>
> > I've got a table containing just over one million records (in CSV
> > format).  Does a batched write (of around 1MB of data and, say 1000
> > records) count as one write, or 1000 writes?- Hide quoted text -
>
> - Show quoted text -
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-11 Thread Larry Cable

So now, I am hitting Datastore timeouts and Request timeouts ...

I really really think you guys need to add a mechanism that allows
developers to simply do bulk uploads of data into their GAE
applications (from Java thank you).

:)

On Sep 11, 9:06 am, Larry Cable  wrote:
> I tried doing a "bulk" load with the JDO makePersistentAll(..) call
> yesterday ...
>
> by default what I did was created a List of size 2048, filled it to
> capacity and then called makePersistentAll() ... I got an
> IllegalArgumentException out of that call stating that you could
> only persist at most 500 objects per call ...
>
> I was unable to retest this, because despite making a change to
> the capacity of the the List to 500 and re-deploying the redeployment
> did not seem to take effect ...
>
> will keep trying ...
>
> On Sep 4, 5:24 pm, "Jason (Google)"  wrote:
>
>
>
> > Batch puts are supported, yes, and as of yesterday's release, calling
> > makePersistentAll (JDO) and the equivalent JPA call will take advantage of
> > this support (previously, you had to use the low-level API).
>
> > Two quick notes:
>
> > 1) All of the entities that you're persisting should be in separate entity
> > groups since two entities in the same entity group can't be written to
> > consecutively, and you will see datastore timeout exceptions if many
> > simultaneous write requests come in for the same entity or entity group.
> > 2) Batch puts do not operate in a transaction. This means that some writes
> > may succeed but others may not, so if you need the ability to rollback,
> > you'll need transactions.
>
> > - Jason
>
> > Let me know if you have any more questions on this.
>
> > - Jason
>
> > On Thu, Sep 3, 2009 at 7:24 PM, Nicholas Albion  wrote:
>
> > > Is it possible to overcome the datastore's 10 writes/second limit by
> > > batching them?
>
> > > I've got a table containing just over one million records (in CSV
> > > format).  Does a batched write (of around 1MB of data and, say 1000
> > > records) count as one write, or 1000 writes?- Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-12 Thread P64

Exactly!

I was hoping this update ( 
http://code.google.com/p/datanucleus-appengine/issues/detail?id=7
) would seriously improve bulk inserts. As it seems in practice you
can now do roughly 2-3 times as many inserts in the same ammount of
real and CPU time.

However this is still poor compared to what we're used to with
relational databases on a relatively poor hardware.

At the moment I can do a batch input of up to 300 entities (with a
couple of Integer and String properties) in a 30 second time window
and it costs me around 18 seconds of CPU time.

I have a roughly 1.5MB file which I have to download, parse it's
15.000 lines and insert them in database. I need no transactions in
this case, all entities can be standalone, I don't mine the order in
which they are written, could be parallel aswell as fas as I am
concerned. As it seems now, I have to download this file, slice it in
chunks of 300 lines, store each chunk in a database. Than I need to
put 50 tasks in a queue, each taking 30 seconds to read a chunk from
database, parse it in 300 seperate entities and store as such.

Just database writes to do that would cost me well over 15 CPU
minutes, not to mention the overhead caused by all the task spawning
and so on.

All that for an operation which litteraly takes seconds on my old box
running relational DB, which I use for testing purposes.

It's to complicated and it uses way to many resources to update 1000+
entities - and there are lots of applications that need to update data
from different sources (XML, SQL dumps, ...) on a daily basis.

On 12 sep., 02:30, Larry Cable  wrote:
> So now, I am hitting Datastore timeouts and Request timeouts ...
>
> I really really think you guys need to add a mechanism that allows
> developers to simply do bulk uploads of data into their GAE
> applications (from Java thank you).
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-12 Thread Larry Cable

I am wondering about writing a Servlet that would form/multi-part
upload large files and cache them in memcache then use the
cron API to "trickle" persist them into the DS over time ...

could maybe even get adventurous and put a "filesystem"-like API
over the cache ...

lemme know if anyone would be interested in this ... if there is
enough interest I'll put something together and open source it.

These timeouts in the DS and at the request while "understandable"
are a total PITA coding around them complicates the application
logic considerably ...

honestly I am begining to wonder if EC2 with a tomcat and a MySQL
or Derby DB in it might be a better way to go if you actually want
to store reasonable amounts of data.

I realize that this is an EA environment, but my feedback would be
that if you are aiming to provide a scalable web app and datastore,
there is little point in having either if the hosted application
cannot
store reasonable amounts of data in the 1st place.


On Sep 12, 8:37 am, P64  wrote:
> Exactly!
>
> I was hoping this update 
> (http://code.google.com/p/datanucleus-appengine/issues/detail?id=7
> ) would seriously improve bulk inserts. As it seems in practice you
> can now do roughly 2-3 times as many inserts in the same ammount of
> real and CPU time.
>
> However this is still poor compared to what we're used to with
> relational databases on a relatively poor hardware.
>
> At the moment I can do a batch input of up to 300 entities (with a
> couple of Integer and String properties) in a 30 second time window
> and it costs me around 18 seconds of CPU time.
>
> I have a roughly 1.5MB file which I have to download, parse it's
> 15.000 lines and insert them in database. I need no transactions in
> this case, all entities can be standalone, I don't mine the order in
> which they are written, could be parallel aswell as fas as I am
> concerned. As it seems now, I have to download this file, slice it in
> chunks of 300 lines, store each chunk in a database. Than I need to
> put 50 tasks in a queue, each taking 30 seconds to read a chunk from
> database, parse it in 300 seperate entities and store as such.
>
> Just database writes to do that would cost me well over 15 CPU
> minutes, not to mention the overhead caused by all the task spawning
> and so on.
>
> All that for an operation which litteraly takes seconds on my old box
> running relational DB, which I use for testing purposes.
>
> It's to complicated and it uses way to many resources to update 1000+
> entities - and there are lots of applications that need to update data
> from different sources (XML, SQL dumps, ...) on a daily basis.
>
> On 12 sep., 02:30, Larry Cable  wrote:
>
>
>
> > So now, I am hitting Datastore timeouts and Request timeouts ...
>
> > I really really think you guys need to add a mechanism that allows
> > developers to simply do bulk uploads of data into their GAE
> > applications (from Java thank you).- Hide quoted text -
>
> - Show quoted text -
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-13 Thread Toby

Hi Larry,

not sure if this solution would be reliable.
In any case I am having the same trouble as you.  I have small
datasats of a few tenthousand entities and I keep hitting the 30
second boundary all the time.  I need to write Ajax queries to split
my requests when I try to process the data.  It is a big mess because
I spend so much time engineering stuff that I normally do not even
think about.  The problem is also that the write and delete operations
on the datastore are extremely slow. What would be helpful are
efficient bulk insert and bulk delete methods.  I still have some hope
that this will come one day.

I am wondering if moving to Amazon is a good alternative. What is good
about GAE is actually the integration into Java, JDO and so on.  I
think in EC2 you are all on your own.

Cheers,
Tobias



On Sep 12, 8:53 pm, Larry Cable  wrote:
> I am wondering about writing a Servlet that would form/multi-part
> upload large files and cache them in memcache then use the
> cron API to "trickle" persist them into the DS over time ...
>
> could maybe even get adventurous and put a "filesystem"-like API
> over the cache ...
>
> lemme know if anyone would be interested in this ... if there is
> enough interest I'll put something together and open source it.
>
> These timeouts in the DS and at the request while "understandable"
> are a total PITA coding around them complicates the application
> logic considerably ...
>
> honestly I am begining to wonder if EC2 with a tomcat and a MySQL
> or Derby DB in it might be a better way to go if you actually want
> to store reasonable amounts of data.
>
> I realize that this is an EA environment, but my feedback would be
> that if you are aiming to provide a scalable web app and datastore,
> there is little point in having either if the hosted application
> cannot
> store reasonable amounts of data in the 1st place.
>
> On Sep 12, 8:37 am, P64  wrote:
>
> > Exactly!
>
> > I was hoping this update 
> > (http://code.google.com/p/datanucleus-appengine/issues/detail?id=7
> > ) would seriously improve bulk inserts. As it seems in practice you
> > can now do roughly 2-3 times as many inserts in the same ammount of
> > real and CPU time.
>
> > However this is still poor compared to what we're used to with
> > relational databases on a relatively poor hardware.
>
> > At the moment I can do a batch input of up to 300 entities (with a
> > couple of Integer and String properties) in a 30 second time window
> > and it costs me around 18 seconds of CPU time.
>
> > I have a roughly 1.5MB file which I have to download, parse it's
> > 15.000 lines and insert them in database. I need no transactions in
> > this case, all entities can be standalone, I don't mine the order in
> > which they are written, could be parallel aswell as fas as I am
> > concerned. As it seems now, I have to download this file, slice it in
> > chunks of 300 lines, store each chunk in a database. Than I need to
> > put 50 tasks in a queue, each taking 30 seconds to read a chunk from
> > database, parse it in 300 seperate entities and store as such.
>
> > Just database writes to do that would cost me well over 15 CPU
> > minutes, not to mention the overhead caused by all the task spawning
> > and so on.
>
> > All that for an operation which litteraly takes seconds on my old box
> > running relational DB, which I use for testing purposes.
>
> > It's to complicated and it uses way to many resources to update 1000+
> > entities - and there are lots of applications that need to update data
> > from different sources (XML, SQL dumps, ...) on a daily basis.
>
> > On 12 sep., 02:30, Larry Cable  wrote:
>
> > > So now, I am hitting Datastore timeouts and Request timeouts ...
>
> > > I really really think you guys need to add a mechanism that allows
> > > developers to simply do bulk uploads of data into their GAE
> > > applications (from Java thank you).- Hide quoted text -
>
> > - Show quoted text -
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-13 Thread Vince Bonfanti

This is already being done:

http://code.google.com/p/gaevfs/

I recently adding a CachingDatastoreService class that implements a
write-behind cache using task queues (instead of cron), which will be
included in the next release. The code is available now in SVN:

   
http://code.google.com/p/gaevfs/source/browse/trunk/src/com/newatlanta/appengine/datastore/CachingDatastoreService.java

GaeVFS supports two "filesystem"-like APIs, one based on Apache
Commons VFS (http://commons.apache.org/vfs/) and another based on the
JDK7 NIO2 project (http://openjdk.java.net/projects/nio/). The latter
API will also be in the next GaeVFS release, and the code is now in
SVN:

   
http://code.google.com/p/gaevfs/source/browse/trunk/src/com/newatlanta/appengine/nio/

Vince

On Sat, Sep 12, 2009 at 2:53 PM, Larry Cable  wrote:
>
> I am wondering about writing a Servlet that would form/multi-part
> upload large files and cache them in memcache then use the
> cron API to "trickle" persist them into the DS over time ...
>
> could maybe even get adventurous and put a "filesystem"-like API
> over the cache ...
>
> lemme know if anyone would be interested in this ... if there is
> enough interest I'll put something together and open source it.
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-14 Thread Toby

Hello Vince,

That is pretty cool.  I will have a look.

Cheers,
Tobias

On Sep 13, 9:45 pm, Vince Bonfanti  wrote:
> This is already being done:
>
>    http://code.google.com/p/gaevfs/
>
> I recently adding a CachingDatastoreService class that implements a
> write-behind cache using task queues (instead of cron), which will be
> included in the next release. The code is available now in SVN:
>
>    http://code.google.com/p/gaevfs/source/browse/trunk/src/com/newatlant...
>
> GaeVFS supports two "filesystem"-like APIs, one based on Apache
> Commons VFS (http://commons.apache.org/vfs/) and another based on the
> JDK7 NIO2 project (http://openjdk.java.net/projects/nio/). The latter
> API will also be in the next GaeVFS release, and the code is now in
> SVN:
>
>    http://code.google.com/p/gaevfs/source/browse/trunk/src/com/newatlant...
>
> Vince
>
> On Sat, Sep 12, 2009 at 2:53 PM, Larry Cable  wrote:
>
> > I am wondering about writing a Servlet that would form/multi-part
> > upload large files and cache them in memcache then use the
> > cron API to "trickle" persist them into the DS over time ...
>
> > could maybe even get adventurous and put a "filesystem"-like API
> > over the cache ...
>
> > lemme know if anyone would be interested in this ... if there is
> > enough interest I'll put something together and open source it.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---



[appengine-java] Re: Bulk writes to datastore

2009-09-15 Thread Richard

Hi Larry

> I am wondering about writing a Servlet that would form/multi-part
> upload large files and cache them in memcache then use the
> cron API to "trickle" persist them into the DS over time ...

I've been thinking about using something like this as well.  I think
you could likely cache the upload to the store because the limit here
seems to be mainly the amount of entities, not the size of one entity
(below 1mb).  I have e.g. 100/200k worth of data that I upload, but
because it's represented as a couple hundred entities it chokes.  I
could just upload the 93k and fire off a task (or cron job) that would
parse and insert the data offline.

At the very least, I plan to use the low-level api more.  The (very
useful) performance testing app http://gaejava.appspot.com/ shows
consistently higher CPU usage from JDO.  If this ever improves, that
app should show it.  Until then, low-level looks good.

Regards,
Richard
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to google-appengine-java@googlegroups.com
To unsubscribe from this group, send email to 
google-appengine-java+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en
-~--~~~~--~~--~--~---