Re: [rhino-tools-dev] Re: ETL: how to handle foreign key relationships when loading data?

Miles Waller Thu, 14 Oct 2010 06:50:46 -0700

Hi,

 My only concern would be that someone might come along and use the
> application to insert a person while my ETL process is running,
> causing one of my inserts to fail. I guess I could trap that exception
> and update my internal ID counter or something...
>


The bulk upload is not easy to troubleshoot - it's error messages are at
best opaque...

Thinking off the top of my head, if you split the process into separate
transform and load, presuming transform takes the time and load is quick,
then that should reduce the time window for such a problem to occur.  You
could find a base key value at the start of the load, and add this value to
all the
calculated key values as you load the rows in (locking the table at the same
time).


> My last question is regarding your statement "and then use the output
> of the 'person' pipeline as input to a join in the 'person_phone'
> pipeline".  I thought joins were for taking two rows with different
> columns and joining them into a merged row with all of the columns. Is
> there an example anywhere of using joins to represent parent/child,
> one-to-many relationships?
>

Yes, but if I remember correctly the rows are cached as they are loaded, so
you can have multiple matches... My memory is a little hazy, so perhaps
check the code?

(Also, if so, ignore my comment about memory use - sorting will probably
give you quicker lookups, but won't stop rows building up in memory)

Miles

-- 
You received this message because you are subscribed to the Google Groups 
"Rhino Tools Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rhino-tools-dev?hl=en.

Re: [rhino-tools-dev] Re: ETL: how to handle foreign key relationships when loading data?

Reply via email to