For your use case, you'll need to just do a ranged import (i.e., SELECT *
FROM foo WHERE id > X and id < Y), and then delete the same records after
the import succeeds (DELETE FROM foo WHERE id > X and id < Y). Before the
import, you can SELECT max(id) FROM foo to establish what Y should be; X is
informed by the previous import operation.

You said that you're concerned with the performance of DELETE, but I don't
know a better way around this if all your input sources are forced to write
to the same table. Ideally you could have a "current" table and a "frozen"
table; writes always go to the current table and the import is done from the
frozen table. Then you can DROP TABLE frozen relatively quickly post-import.
At the time of next import you change which table is current and which is
frozen, and repeat. In MySQL you can create updateable views, so you might
want to use a view as an indirection pointer to synchronously change all
your writers from one underlying table to the other.

I'll put a shameless plug here -- I'm developing a tool called sqoop
designed to import from databases into HDFS; patch is available at
http://issues.apache.org/jira/browse/hadoop-5815. It doesn't currently have
support for WHERE clauses, but it's on the roadmap. Please check it out and
let me know what you think.

Cheers,
- Aaron


On Wed, May 20, 2009 at 9:48 AM, dealmaker <vin...@gmail.com> wrote:

>
> No, my prime objective is not to backup db.  I am trying to move the
> records
> from mysql db to hadoop for processing.  Hadoop itself doesn't keep any
> records.  After that, I will remove the same mysql records processed in
> hadoop from the mysql db.  The main point isn't about getting the mysql
> records, the main point is removing the same mysql records that are
> processed in hadoop from mysql db.
>
>
> Edward J. Yoon-2 wrote:
> >
> > Oh.. According to my understanding, To maintain a steady DB size,
> > delete and backup the old records. If so, I guess you can continuously
> > do that using WHERE and LIMIT clauses. Then you can reduce the I/O
> > costs......  It should be dumped at once?
> >
> > On Thu, May 21, 2009 at 12:48 AM, dealmaker <vin...@gmail.com> wrote:
> >>
> >> Other parts of the non-hadoop system will continue to add records to
> >> mysql db
> >> when I move  those records (and remove the very same records from mysql
> >> db
> >> at the same time) to hadoop for processing.  That's why I am doing those
> >> mysql commands.
> >>
> >> What are you suggesting?  If I do it like you suggest, dump all records
> >> from
> >> mysql db to a file in hdfs, how do I remove those very same records from
> >> the
> >> mysql db at the same time?  Just rename it first and then dump them and
> >> then
> >> read them from the hdfs file?
> >>
> >> or should I do it my way?  which way is faster?
> >> Thanks.
> >>
> >>
> >> Edward J. Yoon-2 wrote:
> >>>
> >>> Hadoop is a distributed filesystem. If you wanted to backup your table
> >>> data to hdfs, you can use SELECT * INTO OUTFILE 'file_name' FROM
> >>> tbl_name; Then, put it to hadoop dfs.
> >>>
> >>> Edward
> >>>
> >>> On Thu, May 21, 2009 at 12:08 AM, dealmaker <vin...@gmail.com> wrote:
> >>>>
> >>>> No, actually I am using mysql.  So it doesn't belong to Hive, I think.
> >>>>
> >>>>
> >>>> owen.omalley wrote:
> >>>>>
> >>>>>
> >>>>> On May 19, 2009, at 11:48 PM, dealmaker wrote:
> >>>>>
> >>>>>>
> >>>>>> Hi,
> >>>>>>  I want to backup a table and then create a new empty one with
> >>>>>> following
> >>>>>> commands in Hadoop.  How do I do it in java?  Thanks.
> >>>>>
> >>>>> Since this is a question about Hive, you should be asking on
> >>>>> hive-u...@hadoop.apache.org
> >>>>> .
> >>>>>
> >>>>> -- Owen
> >>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> View this message in context:
> >>>>
> http://www.nabble.com/How-to-Rename---Create-DB-Table-in-Hadoop--tp23629956p23637131.html
> >>>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best Regards, Edward J. Yoon @ NHN, corp.
> >>> edwardy...@apache.org
> >>> http://blog.udanax.org
> >>>
> >>>
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/How-to-Rename---Create-Mysql-DB-Table-in-Hadoop--tp23629956p23638051.html
> >> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon @ NHN, corp.
> > edwardy...@apache.org
> > http://blog.udanax.org
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/How-to-Rename---Create-Mysql-DB-Table-in-Hadoop--tp23629956p23639294.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Reply via email to