Re: Best way to DB import 1M+ rows?
Thanks John, that's a great suggestion. Unfortunately it's looking like it will take about 7.5 hours to import 3.12M rows: 1 tables, 3,123,800 records companies: 1% | | ETA: 07:25:34 I'm wondering if there's a more expedient route... in the past I've used the postgres COPY command [1] to do bulk imports of large data sets quickly, but that requires that the server be able to read a file off the server's local filesystem. I don't suppose that's feasible given how the Heroku platform works, but would love to be pleasantly surprised :) Anyone from Heroku able to pipe up and offer any other possible suggestions? Just to restate the problem, I have a single table with about 3.12M records that I'm wanting to transfer from a local DB to my remote Heroku DB without touching the other Heroku app data. It's ok if the table gets blown away on the Heroku side as it has nothing in it (new model I just added). Happy Friday, Zach [1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html On Thursday, December 9, 2010 at 4:36 AM, johnb wrote: If it's just a single table and you have it in a db locally then db:push --tables tablename would get it up to heroku - but this will replace the contents of the remote table with the local table and not append to it. If the application is live you could put it into maintenance mode, db:pull --tables tablename append your rows to it and then push the table back and put the app live... perhaps? John. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
Re: Best way to DB import 1M+ rows?
If you have a dedicated database, you can use psql to read off the local filesystem. http://docs.heroku.com/heroku-postgresql#psql-console Alternatively, you can dump your data and then restore it. http://docs.heroku.com/pgbackups#import On Fri, Dec 10, 2010 at 10:51 AM, Zach Bailey znbai...@gmail.com wrote: Thanks John, that's a great suggestion. Unfortunately it's looking like it will take about 7.5 hours to import 3.12M rows: 1 tables, 3,123,800 records companies: 1% | | ETA: 07:25:34 I'm wondering if there's a more expedient route... in the past I've used the postgres COPY command [1] to do bulk imports of large data sets quickly, but that requires that the server be able to read a file off the server's local filesystem. I don't suppose that's feasible given how the Heroku platform works, but would love to be pleasantly surprised :) Anyone from Heroku able to pipe up and offer any other possible suggestions? Just to restate the problem, I have a single table with about 3.12M records that I'm wanting to transfer from a local DB to my remote Heroku DB without touching the other Heroku app data. It's ok if the table gets blown away on the Heroku side as it has nothing in it (new model I just added). Happy Friday, Zach [1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html On Thursday, December 9, 2010 at 4:36 AM, johnb wrote: If it's just a single table and you have it in a db locally then db:push --tables tablename would get it up to heroku - but this will replace the contents of the remote table with the local table and not append to it. If the application is live you could put it into maintenance mode, db:pull --tables tablename append your rows to it and then push the table back and put the app live... perhaps? John. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
Re: Best way to DB import 1M+ rows?
Another possible solution would be this: Upload your data in CSV/TSV/whatever form to S3. Write a rake task that does the following: * download from S3 to RAILS_ROOT/tmp * use the psql command line tool (it's on our dyno grid) or one of the ActiveRecord bulk import extensions to read the file and import to your database Then you can run it with heroku rake my_import_task If this is going to be a regular process, you'll likely want to wrap all of this up as something you can run from a worker using DJ or its' ilk. On Dec 10, 2010, at 10:51 AM, Zach Bailey wrote: Thanks John, that's a great suggestion. Unfortunately it's looking like it will take about 7.5 hours to import 3.12M rows: 1 tables, 3,123,800 records companies: 1% | | ETA: 07:25:34 I'm wondering if there's a more expedient route... in the past I've used the postgres COPY command [1] to do bulk imports of large data sets quickly, but that requires that the server be able to read a file off the server's local filesystem. I don't suppose that's feasible given how the Heroku platform works, but would love to be pleasantly surprised :) Anyone from Heroku able to pipe up and offer any other possible suggestions? Just to restate the problem, I have a single table with about 3.12M records that I'm wanting to transfer from a local DB to my remote Heroku DB without touching the other Heroku app data. It's ok if the table gets blown away on the Heroku side as it has nothing in it (new model I just added). Happy Friday, Zach [1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html On Thursday, December 9, 2010 at 4:36 AM, johnb wrote: If it's just a single table and you have it in a db locally then db:push --tables tablename would get it up to heroku - but this will replace the contents of the remote table with the local table and not append to it. If the application is live you could put it into maintenance mode, db:pull --tables tablename append your rows to it and then push the table back and put the app live... perhaps? John. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
Re: Best way to DB import 1M+ rows?
Thank you David and Peter for your awesome ideas. Reading back over the postgres COPY command docs [1] it doesn't look like that would ever be a feasible solution given the following stipulation: Files named in a COPY command are read or written directly by the server, not by the client application. Therefore, they must reside on or be accessible to the database server machine, not the client. They must be accessible to and readable or writable by the PostgreSQL user (the user ID the server runs as), not the client. COPY naming a file is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access. So, it looks like this is something Heroku would have to wrap into an abstracted administrative function down the line, if they ever did it at all, due to the super-user access requirement. Given that, the remaining options consist of a.) restoring a partial backup or b.) doing raw inserts. a.) seems like it would be possible using a dedicated db + heroku pg:ingress + pg_restore -a -t tablename b.) is of course possible via a variety of methods (rake task, REST API, taps, etc) but unfortunately rather slow as there is a lot of overhead in doing single inserts (when compared to writing table structure directly, etc.) I'm thinking I'll give option (a) a try and see how it goes. -Zach [1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html On Friday, December 10, 2010 at 3:12 PM, David Dollar wrote: Another possible solution would be this: Upload your data in CSV/TSV/whatever form to S3. Write a rake task that does the following: * download from S3 to RAILS_ROOT/tmp * use the psql command line tool (it's on our dyno grid) or one of the ActiveRecord bulk import extensions to read the file and import to your database Then you can run it with heroku rake my_import_task If this is going to be a regular process, you'll likely want to wrap all of this up as something you can run from a worker using DJ or its' ilk. On Dec 10, 2010, at 10:51 AM, Zach Bailey wrote: Thanks John, that's a great suggestion. Unfortunately it's looking like it will take about 7.5 hours to import 3.12M rows: 1 tables, 3,123,800 records companies: 1% | | ETA: 07:25:34 I'm wondering if there's a more expedient route... in the past I've used the postgres COPY command [1] to do bulk imports of large data sets quickly, but that requires that the server be able to read a file off the server's local filesystem. I don't suppose that's feasible given how the Heroku platform works, but would love to be pleasantly surprised :) Anyone from Heroku able to pipe up and offer any other possible suggestions? Just to restate the problem, I have a single table with about 3.12M records that I'm wanting to transfer from a local DB to my remote Heroku DB without touching the other Heroku app data. It's ok if the table gets blown away on the Heroku side as it has nothing in it (new model I just added). Happy Friday, Zach [1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html On Thursday, December 9, 2010 at 4:36 AM, johnb wrote: If it's just a single table and you have it in a db locally then db:push --tables tablename would get it up to heroku - but this will replace the contents of the remote table with the local table and not append to it. If the application is live you could put it into maintenance mode, db:pull --tables tablename append your rows to it and then push the table back and put the app live... perhaps? John. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group,
Re: Best way to DB import 1M+ rows?
Yeah, we're not too likely to build in some elaborate support for such a niche use-case when there are tons of really universally useful things to work on. Let us know how it goes and if you run into any big problems. Regards, Peter On Fri, Dec 10, 2010 at 12:35 PM, Zach Bailey znbai...@gmail.com wrote: Thank you David and Peter for your awesome ideas. Reading back over the postgres COPY command docs [1] it doesn't look like that would ever be a feasible solution given the following stipulation: *Files named in a **COPY** command are read or written directly by the server, not by the client application. Therefore, they must reside on or be accessible to the database server machine, not the client. They must be accessible to and readable or writable by the **PostgreSQL** user (the user ID the server runs as), not the client. **COPY** naming a file is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.* So, it looks like this is something Heroku would have to wrap into an abstracted administrative function down the line, if they ever did it at all, due to the super-user access requirement. Given that, the remaining options consist of a.) restoring a partial backup or b.) doing raw inserts. a.) seems like it would be possible using a dedicated db + heroku pg:ingress + pg_restore -a -t tablename b.) is of course possible via a variety of methods (rake task, REST API, taps, etc) but unfortunately rather slow as there is a lot of overhead in doing single inserts (when compared to writing table structure directly, etc.) I'm thinking I'll give option (a) a try and see how it goes. -Zach [1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html On Friday, December 10, 2010 at 3:12 PM, David Dollar wrote: Another possible solution would be this: Upload your data in CSV/TSV/whatever form to S3. Write a rake task that does the following: * download from S3 to RAILS_ROOT/tmp * use the psql command line tool (it's on our dyno grid) or one of the ActiveRecord bulk import extensions to read the file and import to your database Then you can run it with heroku rake my_import_task If this is going to be a regular process, you'll likely want to wrap all of this up as something you can run from a worker using DJ or its' ilk. On Dec 10, 2010, at 10:51 AM, Zach Bailey wrote: Thanks John, that's a great suggestion. Unfortunately it's looking like it will take about 7.5 hours to import 3.12M rows: 1 tables, 3,123,800 records companies: 1% | | ETA: 07:25:34 I'm wondering if there's a more expedient route... in the past I've used the postgres COPY command [1] to do bulk imports of large data sets quickly, but that requires that the server be able to read a file off the server's local filesystem. I don't suppose that's feasible given how the Heroku platform works, but would love to be pleasantly surprised :) Anyone from Heroku able to pipe up and offer any other possible suggestions? Just to restate the problem, I have a single table with about 3.12M records that I'm wanting to transfer from a local DB to my remote Heroku DB without touching the other Heroku app data. It's ok if the table gets blown away on the Heroku side as it has nothing in it (new model I just added). Happy Friday, Zach [1] http://www.postgresql.org/docs/8.4/interactive/sql-copy.html On Thursday, December 9, 2010 at 4:36 AM, johnb wrote: If it's just a single table and you have it in a db locally then db:push --tables tablename would get it up to heroku - but this will replace the contents of the remote table with the local table and not append to it. If the application is live you could put it into maintenance mode, db:pull --tables tablename append your rows to it and then push the table back and put the app live... perhaps? John. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you
Re: Best way to DB import 1M+ rows?
If it's just a single table and you have it in a db locally then db:push --tables tablename would get it up to heroku - but this will replace the contents of the remote table with the local table and not append to it. If the application is live you could put it into maintenance mode, db:pull --tables tablename append your rows to it and then push the table back and put the app live... perhaps? John. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
Best way to DB import 1M+ rows?
I'm wondering if anyone has had experience importing a large amount of data into their heroku database and how they went about doing it. I'm staring down about 3-4M rows of (static) data that I'd like to get into our DB on Heroku. If I had my own database locally I would probably use the COPY command as it is blazing fast. Inserting a row at a time from ruby code or going through an exposed API is looking like it will take multiple days to do (without spending extra money to crank up more dynos and buy a larger DB option) Another option I thought of would be to figure out a way to load the data in a local db then dump only that table and restore it into the heroku db as a partial backup, but I didn't see any facility for that. Has anyone done anything like this or think of a good way for doing it? Thanks, Zach -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
Re: Best way to DB import 1M+ rows?
If you're using a dedicated database (and if your queries are non-trivial you probably want to move off the shared ones) then you can just use heroku pg:ingress + psql. -p On Wed, Dec 8, 2010 at 10:51 AM, Zach Bailey znbai...@gmail.com wrote: I'm wondering if anyone has had experience importing a large amount of data into their heroku database and how they went about doing it. I'm staring down about 3-4M rows of (static) data that I'd like to get into our DB on Heroku. If I had my own database locally I would probably use the COPY command as it is blazing fast. Inserting a row at a time from ruby code or going through an exposed API is looking like it will take multiple days to do (without spending extra money to crank up more dynos and buy a larger DB option) Another option I thought of would be to figure out a way to load the data in a local db then dump only that table and restore it into the heroku db as a partial backup, but I didn't see any facility for that. Has anyone done anything like this or think of a good way for doing it? Thanks, Zach -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.comheroku%2bunsubscr...@googlegroups.com . For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.
Re: Best way to DB import 1M+ rows?
Maybe heroku db:push would do that? but its one off as it would scrap all data on server. On 9 Dec 2010, at 2:51 AM, Zach Bailey wrote: I'm wondering if anyone has had experience importing a large amount of data into their heroku database and how they went about doing it. I'm staring down about 3-4M rows of (static) data that I'd like to get into our DB on Heroku. If I had my own database locally I would probably use the COPY command as it is blazing fast. Inserting a row at a time from ruby code or going through an exposed API is looking like it will take multiple days to do (without spending extra money to crank up more dynos and buy a larger DB option) Another option I thought of would be to figure out a way to load the data in a local db then dump only that table and restore it into the heroku db as a partial backup, but I didn't see any facility for that. Has anyone done anything like this or think of a good way for doing it? Thanks, Zach -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en. -- You received this message because you are subscribed to the Google Groups Heroku group. To post to this group, send email to her...@googlegroups.com. To unsubscribe from this group, send email to heroku+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/heroku?hl=en.