Re: Best way to DB import 1M+ rows?

2010-12-10 Thread Zach Bailey

 Thanks John, that's a great suggestion. Unfortunately it's looking like it 
will take about 7.5 hours to import 3.12M rows:


1 tables, 3,123,800 records
companies:  1% | | ETA: 07:25:34


I'm wondering if there's a more expedient route... in the past I've used the 
postgres COPY command [1] to do bulk imports of large data sets quickly, but 
that requires that the server be able to read a file off the server's local 
filesystem. I don't suppose that's feasible given how the Heroku platform 
works, but would love to be pleasantly surprised :)


Anyone from Heroku able to pipe up and offer any other possible suggestions? 
Just to restate the problem, I have a single table with about 3.12M records 
that I'm wanting to transfer from a local DB to my remote Heroku DB without 
touching the other Heroku app data. It's ok if the table gets blown away on the 
Heroku side as it has nothing in it (new model I just added).

Happy Friday,
Zach



[1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html

On Thursday, December 9, 2010 at 4:36 AM, johnb wrote:

 If it's just a single table and you have it in a db locally then db:push 
 --tables tablename would get it up to heroku - but this will replace the 
 contents of the remote table with the local table and not append to it. If 
 the application is live you could put it into maintenance mode, db:pull 
 --tables tablename append your rows to it and then push the table back and 
 put the app live...
 
 
 perhaps?
 
 
 John.
 
  -- 
  You received this message because you are subscribed to the Google Groups 
 Heroku group.
  To post to this group, send email to her...@googlegroups.com.
  To unsubscribe from this group, send email to 
 heroku+unsubscr...@googlegroups.com.
  For more options, visit this group at 
 http://groups.google.com/group/heroku?hl=en.
 
 
 
 


-- 
You received this message because you are subscribed to the Google Groups 
Heroku group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, send email to 
heroku+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.



Re: Best way to DB import 1M+ rows?

2010-12-10 Thread Oren Teich
If you have a dedicated database, you can use psql to read off the
local filesystem.
http://docs.heroku.com/heroku-postgresql#psql-console

Alternatively, you can dump your data and then restore it.
http://docs.heroku.com/pgbackups#import

On Fri, Dec 10, 2010 at 10:51 AM, Zach Bailey znbai...@gmail.com wrote:
 Thanks John, that's a great suggestion. Unfortunately it's looking like it
 will take about 7.5 hours to import 3.12M rows:
 1 tables, 3,123,800 records
 companies:       1% |                                          | ETA:
  07:25:34
 I'm wondering if there's a more expedient route... in the past I've used the
 postgres COPY command [1] to do bulk imports of large data sets quickly, but
 that requires that the server be able to read a file off the server's local
 filesystem. I don't suppose that's feasible given how the Heroku platform
 works, but would love to be pleasantly surprised :)
 Anyone from Heroku able to pipe up and offer any other possible suggestions?
 Just to restate the problem, I have a single table with about 3.12M records
 that I'm wanting to transfer from a local DB to my remote Heroku DB without
 touching the other Heroku app data. It's ok if the table gets blown away on
 the Heroku side as it has nothing in it (new model I just added).
 Happy Friday,
 Zach

 [1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html

 On Thursday, December 9, 2010 at 4:36 AM, johnb wrote:

 If it's just a single table and you have it in a db locally then db:push
 --tables tablename would get it up to heroku - but this will replace the
 contents of the remote table with the local table and not append to it. If
 the application is live you could put it into maintenance mode, db:pull
 --tables tablename append your rows to it and then push the table back and
 put the app live...
 perhaps?
 John.

 --
 You received this message because you are subscribed to the Google Groups
 Heroku group.
 To post to this group, send email to her...@googlegroups.com.
 To unsubscribe from this group, send email to
 heroku+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/heroku?hl=en.

 --
 You received this message because you are subscribed to the Google Groups
 Heroku group.
 To post to this group, send email to her...@googlegroups.com.
 To unsubscribe from this group, send email to
 heroku+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/heroku?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Heroku group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, send email to 
heroku+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.



Re: Best way to DB import 1M+ rows?

2010-12-10 Thread David Dollar
Another possible solution would be this:

Upload your data in CSV/TSV/whatever form to S3. Write a rake task that does 
the following:

* download from S3 to RAILS_ROOT/tmp
* use the psql command line tool (it's on our dyno grid) or one of the 
ActiveRecord bulk import extensions to read the file and import to your database

Then you can run it with heroku rake my_import_task

If this is going to be a regular process, you'll likely want to wrap all of 
this up as something you can run from a worker using DJ or its' ilk.

On Dec 10, 2010, at 10:51 AM, Zach Bailey wrote:

 
 Thanks John, that's a great suggestion. Unfortunately it's looking like it 
 will take about 7.5 hours to import 3.12M rows:
 
 1 tables, 3,123,800 records
 companies:   1% |  | ETA:  
 07:25:34
 
 I'm wondering if there's a more expedient route... in the past I've used the 
 postgres COPY command [1] to do bulk imports of large data sets quickly, but 
 that requires that the server be able to read a file off the server's local 
 filesystem. I don't suppose that's feasible given how the Heroku platform 
 works, but would love to be pleasantly surprised :)
 
 Anyone from Heroku able to pipe up and offer any other possible suggestions? 
 Just to restate the problem, I have a single table with about 3.12M records 
 that I'm wanting to transfer from a local DB to my remote Heroku DB without 
 touching the other Heroku app data. It's ok if the table gets blown away on 
 the Heroku side as it has nothing in it (new model I just added).
 
 Happy Friday,
 Zach
 
 [1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html
 
 On Thursday, December 9, 2010 at 4:36 AM, johnb wrote:
 
 If it's just a single table and you have it in a db locally then db:push 
 --tables tablename would get it up to heroku - but this will replace the 
 contents of the remote table with the local table and not append to it. If 
 the application is live you could put it into maintenance mode, db:pull 
 --tables tablename append your rows to it and then push the table back and 
 put the app live...
 
 perhaps?
 
 John.
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 Heroku group.
 To post to this group, send email to her...@googlegroups.com.
 To unsubscribe from this group, send email to 
 heroku+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/heroku?hl=en.
 
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 Heroku group.
 To post to this group, send email to her...@googlegroups.com.
 To unsubscribe from this group, send email to 
 heroku+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/heroku?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
Heroku group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, send email to 
heroku+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.



Re: Best way to DB import 1M+ rows?

2010-12-10 Thread Zach Bailey

 Thank you David and Peter for your awesome ideas. Reading back over the 
postgres COPY command docs [1] it doesn't look like that would ever be a 
feasible solution given the following stipulation:


Files named in a COPY command are read or written directly by the server, not 
by the client application. Therefore, they must reside on or be accessible to 
the database server machine, not the client. They must be accessible to and 
readable or writable by the PostgreSQL user (the user ID the server runs as), 
not the client. COPY naming a file is only allowed to database superusers, 
since it allows reading or writing any file that the server has privileges to 
access.


So, it looks like this is something Heroku would have to wrap into an 
abstracted administrative function down the line, if they ever did it at all, 
due to the super-user access requirement.


Given that, the remaining options consist of a.) restoring a partial backup or 
b.) doing raw inserts.


a.) seems like it would be possible using a dedicated db + heroku pg:ingress + 
pg_restore -a -t tablename


b.) is of course possible via a variety of methods (rake task, REST API, taps, 
etc) but unfortunately rather slow as there is a lot of overhead in doing 
single inserts (when compared to writing table structure directly, etc.)


I'm thinking I'll give option (a) a try and see how it goes.

-Zach



[1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html

On Friday, December 10, 2010 at 3:12 PM, David Dollar wrote:

 Another possible solution would be this:
 
 
 Upload your data in CSV/TSV/whatever form to S3. Write a rake task that does 
 the following:
 
 
 * download from S3 to RAILS_ROOT/tmp
 * use the psql command line tool (it's on our dyno grid) or one of the 
 ActiveRecord bulk import extensions to read the file and import to your 
 database
 
 
 Then you can run it with heroku rake my_import_task
 
 
 If this is going to be a regular process, you'll likely want to wrap all of 
 this up as something you can run from a worker using DJ or its' ilk.
 
 On Dec 10, 2010, at 10:51 AM, Zach Bailey wrote:
 
 
  
  
   Thanks John, that's a great suggestion. Unfortunately it's looking like it 
  will take about 7.5 hours to import 3.12M rows:
  
  
  1 tables, 3,123,800 records
  companies:  1% | | ETA: 07:25:34
  
  
  I'm wondering if there's a more expedient route... in the past I've used 
  the postgres COPY command [1] to do bulk imports of large data sets 
  quickly, but that requires that the server be able to read a file off the 
  server's local filesystem. I don't suppose that's feasible given how the 
  Heroku platform works, but would love to be pleasantly surprised :)
  
  
  Anyone from Heroku able to pipe up and offer any other possible 
  suggestions? Just to restate the problem, I have a single table with about 
  3.12M records that I'm wanting to transfer from a local DB to my remote 
  Heroku DB without touching the other Heroku app data. It's ok if the table 
  gets blown away on the Heroku side as it has nothing in it (new model I 
  just added).
  
  Happy Friday,
  Zach
  
  
  
  [1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html
  
  On Thursday, December 9, 2010 at 4:36 AM, johnb wrote:
  
   If it's just a single table and you have it in a db locally then db:push 
   --tables tablename would get it up to heroku - but this will replace 
   the contents of the remote table with the local table and not append to 
   it. If the application is live you could put it into maintenance mode, 
   db:pull --tables tablename append your rows to it and then push the 
   table back and put the app live...
   
   
   perhaps?
   
   
   John.
   
   
-- 
You received this message because you are subscribed to the Google 
   Groups Heroku group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, send email to 
   heroku+unsubscr...@googlegroups.com.
For more options, visit this group at 
   http://groups.google.com/group/heroku?hl=en.
   
   
   
   
   
  
  
  
  
   -- 
   You received this message because you are subscribed to the Google Groups 
  Heroku group.
   To post to this group, send email to her...@googlegroups.com.
   To unsubscribe from this group, send email to 
  heroku+unsubscr...@googlegroups.com.
   For more options, visit this group at 
  http://groups.google.com/group/heroku?hl=en.
  
  
 
 
 
 
  -- 
  You received this message because you are subscribed to the Google Groups 
 Heroku group.
  To post to this group, send email to her...@googlegroups.com.
  To unsubscribe from this group, send email to 
 heroku+unsubscr...@googlegroups.com.
  For more options, visit this group at 
 http://groups.google.com/group/heroku?hl=en.
 
 
 
 


-- 
You received this message because you are subscribed to the Google Groups 
Heroku group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, 

Re: Best way to DB import 1M+ rows?

2010-12-10 Thread Peter van Hardenberg
Yeah, we're not too likely to build in some elaborate support for such a
niche use-case when there are tons of really universally useful things to
work on. Let us know how it goes and if you run into any big problems.

Regards,
Peter

On Fri, Dec 10, 2010 at 12:35 PM, Zach Bailey znbai...@gmail.com wrote:

  Thank you David and Peter for your awesome ideas. Reading back over the
 postgres COPY command docs [1] it doesn't look like that would ever be a
 feasible solution given the following stipulation:

 *Files named in a **COPY** command are read or written directly by the
 server, not by the client application. Therefore, they must reside on or be
 accessible to the database server machine, not the client. They must be
 accessible to and readable or writable by the **PostgreSQL** user (the
 user ID the server runs as), not the client. **COPY** naming a file is
 only allowed to database superusers, since it allows reading or writing any
 file that the server has privileges to access.*

 So, it looks like this is something Heroku would have to wrap into an
 abstracted administrative function down the line, if they ever did it at
 all, due to the super-user access requirement.

 Given that, the remaining options consist of a.) restoring a partial backup
 or b.) doing raw inserts.

 a.) seems like it would be possible using a dedicated db + heroku
 pg:ingress + pg_restore -a -t tablename

 b.) is of course possible via a variety of methods (rake task, REST API,
 taps, etc) but unfortunately rather slow as there is a lot of overhead in
 doing single inserts (when compared to writing table structure directly,
 etc.)

 I'm thinking I'll give option (a) a try and see how it goes.

 -Zach

 [1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html

 On Friday, December 10, 2010 at 3:12 PM, David Dollar wrote:

 Another possible solution would be this:

 Upload your data in CSV/TSV/whatever form to S3. Write a rake task that
 does the following:

 * download from S3 to RAILS_ROOT/tmp
 * use the psql command line tool (it's on our dyno grid) or one of the
 ActiveRecord bulk import extensions to read the file and import to your
 database

 Then you can run it with heroku rake my_import_task

 If this is going to be a regular process, you'll likely want to wrap all of
 this up as something you can run from a worker using DJ or its' ilk.

 On Dec 10, 2010, at 10:51 AM, Zach Bailey wrote:


 Thanks John, that's a great suggestion. Unfortunately it's looking like it
 will take about 7.5 hours to import 3.12M rows:

 1 tables, 3,123,800 records
 companies:   1% |  | ETA:
  07:25:34

 I'm wondering if there's a more expedient route... in the past I've used
 the postgres COPY command [1] to do bulk imports of large data sets quickly,
 but that requires that the server be able to read a file off the server's
 local filesystem. I don't suppose that's feasible given how the Heroku
 platform works, but would love to be pleasantly surprised :)

 Anyone from Heroku able to pipe up and offer any other possible
 suggestions? Just to restate the problem, I have a single table with about
 3.12M records that I'm wanting to transfer from a local DB to my remote
 Heroku DB without touching the other Heroku app data. It's ok if the table
 gets blown away on the Heroku side as it has nothing in it (new model I just
 added).

 Happy Friday,
 Zach

 [1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html

 On Thursday, December 9, 2010 at 4:36 AM, johnb wrote:

 If it's just a single table and you have it in a db locally then db:push
 --tables tablename would get it up to heroku - but this will replace the
 contents of the remote table with the local table and not append to it. If
 the application is live you could put it into maintenance mode, db:pull
 --tables tablename append your rows to it and then push the table back and
 put the app live...

 perhaps?

 John.

 --
 You received this message because you are subscribed to the Google Groups
 Heroku group.
 To post to this group, send email to her...@googlegroups.com.
 To unsubscribe from this group, send email to
 heroku+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/heroku?hl=en.



 --
 You received this message because you are subscribed to the Google Groups
 Heroku group.
 To post to this group, send email to her...@googlegroups.com.
 To unsubscribe from this group, send email to
 heroku+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/heroku?hl=en.


  --
 You received this message because you are subscribed to the Google Groups
 Heroku group.
 To post to this group, send email to her...@googlegroups.com.
 To unsubscribe from this group, send email to
 heroku+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/heroku?hl=en.


  --
 You received this message because you 

Re: Best way to DB import 1M+ rows?

2010-12-09 Thread johnb
If it's just a single table and you have it in a db locally then
db:push --tables tablename would get it up to heroku - but this will
replace the contents of the remote table with the local table and not
append to it. If the application is live you could put it into
maintenance mode, db:pull --tables tablename append your rows to it
and then push the table back and put the app live...


perhaps?


John.

-- 
You received this message because you are subscribed to the Google Groups 
Heroku group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, send email to 
heroku+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.



Best way to DB import 1M+ rows?

2010-12-08 Thread Zach Bailey
 I'm wondering if anyone has had experience importing a large amount of data 
into their heroku database and how they went about doing it. I'm staring down 
about 3-4M rows of (static) data that I'd like to get into our DB on Heroku.


If I had my own database locally I would probably use the COPY command as it is 
blazing fast. Inserting a row at a time from ruby code or going through an 
exposed API is looking like it will take multiple days to do (without spending 
extra money to crank up more dynos and buy a larger DB option)


Another option I thought of would be to figure out a way to load the data in a 
local db then dump only that table and restore it into the heroku db as a 
partial backup, but I didn't see any facility for that.


Has anyone done anything like this or think of a good way for doing it?


Thanks,
Zach

-- 
You received this message because you are subscribed to the Google Groups 
Heroku group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, send email to 
heroku+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.



Re: Best way to DB import 1M+ rows?

2010-12-08 Thread Peter van Hardenberg
If you're using a dedicated database (and if your queries are non-trivial
you probably want to move off the shared ones) then you can just use heroku
pg:ingress + psql.

-p

On Wed, Dec 8, 2010 at 10:51 AM, Zach Bailey znbai...@gmail.com wrote:

  I'm wondering if anyone has had experience importing a large amount of
 data into their heroku database and how they went about doing it. I'm
 staring down about 3-4M rows of (static) data that I'd like to get into our
 DB on Heroku.


 If I had my own database locally I would probably use the COPY command as
 it is blazing fast. Inserting a row at a time from ruby code or going
 through an exposed API is looking like it will take multiple days to do
 (without spending extra money to crank up more dynos and buy a larger DB
 option)


 Another option I thought of would be to figure out a way to load the data
 in a local db then dump only that table and restore it into the heroku db as
 a partial backup, but I didn't see any facility for that.


 Has anyone done anything like this or think of a good way for doing it?


 Thanks,
 Zach

 --
 You received this message because you are subscribed to the Google Groups
 Heroku group.
 To post to this group, send email to her...@googlegroups.com.
 To unsubscribe from this group, send email to
 heroku+unsubscr...@googlegroups.comheroku%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/heroku?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Heroku group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, send email to 
heroku+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.



Re: Best way to DB import 1M+ rows?

2010-12-08 Thread Yeung William
Maybe heroku db:push would do that? but its one off as it would scrap all data 
on server.


On 9 Dec 2010, at 2:51 AM, Zach Bailey wrote:

 I'm wondering if anyone has had experience importing a large amount of data 
 into their heroku database and how they went about doing it. I'm staring down 
 about 3-4M rows of (static) data that I'd like to get into our DB on Heroku.
 
 
 If I had my own database locally I would probably use the COPY command as it 
 is blazing fast. Inserting a row at a time from ruby code or going through an 
 exposed API is looking like it will take multiple days to do (without 
 spending extra money to crank up more dynos and buy a larger DB option)
 
 
 Another option I thought of would be to figure out a way to load the data in 
 a local db then dump only that table and restore it into the heroku db as a 
 partial backup, but I didn't see any facility for that.
 
 
 Has anyone done anything like this or think of a good way for doing it?
 
 
 Thanks,
 Zach
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 Heroku group.
 To post to this group, send email to her...@googlegroups.com.
 To unsubscribe from this group, send email to 
 heroku+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/heroku?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
Heroku group.
To post to this group, send email to her...@googlegroups.com.
To unsubscribe from this group, send email to 
heroku+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/heroku?hl=en.