Re: Run query in background while showing "Please wait"

2008-04-01 Thread kdecapite

> Table cache is the name of what is happening, and your DB engine is
> who is doing it. Also about setting indexes:

Thanks for the links. I read through them but will have to again to
make sure the information sinks in. Overall, it sounds like the best
optimization is to only include fields in the query which are indexed.
This presents a fundamental problem for me since I am querying quite a
few fields (notice all the address fields) and I know creating an
index for all of these columns is probably not the best solution
either. In short, I'm not sure how much the index is really helping in
my particular case.

> B Logica's query :
>
> Will return :
>
> 100 John Smith // this matches "1 John Smith"
> 100 John Smith // this matches "2 John Smith"
>
> So, fix it this way:
>
> SELECT  c.id, c.fn, c.ln FROM contacts AS c INNER JOIN import_contacts
> AS ic ON c.fn = ic.fn AND c.ln = ic.ln GROUP BY (c.id);

Thanks for the explanation of why I'm getting the duplicates in the
result set, I think I just needed it visually spelled out. At any
extent, I tried the DISTINCT and GROUP BY options but decided to
remove them and settled on this query for now:

SELECT
ic.id, ic.fn, ic.ln, c.id, ic.phone_home, ic.line1, ic.line2, ic.city,
ic.st, ic.zip,
c.phone_home,
a.line1, a.line2, a.city, a.st, a.zip
FROM import_contacts AS ic INNER JOIN contacts AS c ON c.fn = ic.fn
AND c.ln = ic.ln
LEFT JOIN addresses AS a ON c.id = a.contact_id
ORDER BY ic.ln, ic.fn

I get 25 records this way (obviously some are retrieved twice) but I
actually want that. This is because the result set is being displayed
to the client as a 2-column layout (in HTML I mean, not database
tables :) with the POTENTIAL duplicates from the import_contacts table
on the left and each of their corresponding existing entries on the
right. I believe the process of filtering out duplicates still needs
to be a manual process because two John Smith's will likely exist and
we have to determine if the John Smith awaiting import is updating his
information, repeating it or actually a completely different John
Smith altogether.

BTW, the above query executes in about 32-47ms! It even retrieves all
the data I actually need for my view (imagine that ;) Now I just need
to change some HTML code and I think I'll be in good shape...
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: Run query in background while showing "Please wait"

2008-03-31 Thread kdecapite

> Oh, and don't forget to create an index on (fn, ln) for both tables.

Sorry for being out of touch lately. I had to table this issue to make
progress on a different work-related project. Anyway, I spent some
time reading up on indexes. Just by adding the (ln, fn) index to both
tables (contacts and import_contacts), my query went from an 18-21
second execution time to a 9-10 second execution time, but not
consistently. I ran the query again a few minutes later and it took 18
seconds still. I tried again 10-15 minutes later and the execution
time went back down to 11 seconds. Not sure what's happening here
exactly.

> Sounds about right, except for the difference in number of dupes. How many 
> are there in total?

Using my method of finding duplicates, I'm counting 14 matches.
However, when I looked again at b logica's query, I'm really getting
16 matches and not 24.

Here's why:

Even though 24 records are returned, some of them are duplicate names
due to record redundancy (a separate issue related to how the data is
managed by my client). For instance, the name "Joe Rohan" appears 3
times simply because this name exists 3 times in the contacts table as
individual records (there are 3 Joe Rohan's in the contacts table, I
mean).

The only confusing thing is that some names appear more than once in b
logica's result set but they are displaying the same "contact.id"
value. For example, the contact with the name "Kami Dolney" appears
twice in the result set, but each record is showing the same primary
key ID. So Kami Dolney does not exist twice in the contacts table. I'm
not sure why this contact is appearing twice in the query's result
set.

Even more confounding is the fact that this contact (Kami Dolney) does
NOT even appear at all as a duplicate using my method of finding dupes
(along with a couple other names). In other words, b logica's query
seems to be returning an accurate result set of matching contact names
from both tables.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: Run query in background while showing "Please wait"

2008-03-27 Thread kdecapite

Thanks to everyone who has contributed so far, it's much appreciated!

Now, here's where we stand...

jonknee's query took about 25-27 seconds and returned 154 records.

b logica's query took about 6 seconds and returned 24 records

My original query takes 18 seconds and returns ALL records and is
exactly this:

SELECT CONCAT(`Contact`.`ln`, `Contact`.`fn`) AS `full_name`,
`Contact`.`id`, `Contact`.`created`, `Contact`.`modified`,
`Contact`.`fn`, `Contact`.`ln`, `Contact`.`co`, `Contact`.`email1`,
`Contact`.`phone_home`, `Address`.`line1`, `Address`.`line2`,
`Address`.`city`, `Address`.`st`, `Address`.`zip` FROM `contacts` AS
`Contact` LEFT JOIN `addresses` AS `Address` ON `Contact`.`id` =
`Address`.`contact_id` WHERE `Contact`.`ln` <> '' AND `Contact`.`fn`
<> '' AND `Contact`.`ln` IS NOT NULL AND `Contact`.`fn` IS NOT NULL
ORDER BY `Contact`.`ln` ASC, `Contact`.`fn` ASC

Then I store the results into an array and run this query:

SELECT * FROM `import_contacts` AS `ImportContact` ORDER BY `ln` ASC,
`fn` ASC

I store those results into an array, too, then I loop through the
ImportContact array looking for a matching key (PHP's array_key_exists
function). I can do this because I format the two result set arrays
using a concatenated string as the key of the array. For instance, a
record with a first name of "Mark" and last name of "Smith" gets
stored as:

array("MarkSmith" => array("field1" => "value1", "field2" =>
"value2"));

This is working great, it's just timing out my PHP script because it
can take 3+ minutes to execute the whole thing across the 30k records
I have :-/


On Mar 27, 5:44 pm, "Dardo Sordi Bogado" <[EMAIL PROTECTED]> wrote:
> >  SELECT * FROM contactImport WHERE first_name IN (SELECT first_name
> >  FROM contact) AND last_name in (SELECT last_name FROM contact)
>
> That is a really bad query (for 30k records) and even don't solve the issue.
>
> >  b logica's way would work too, so you could just check whatever is
> >  faster. I don't have 30,000 records to test that out on :P.
>
> blogica's one seems to be right.
>
>
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: Run query in background while showing "Please wait"

2008-03-27 Thread kdecapite

> If I'm understanding it correctly, you just have a full name in the import
> model and broken out first and last names in your contact model.

I may have not explained the setup properly. I don't actually have a
"full_name" field in either table (import_contacts and contacts are
the actual MySQL table names). Each of these tables have "fn" and "ln"
fields, however. So the problem is that I need to somehow find all
records in the contacts table which have the same values in the "fn"
and "ln" fields as the import_contacts table.

> Hmm. Is there a way you could make them directly correlate? It's a
> pretty fast DB query if you can get them to line up.

The only idea I have at the moment is to create a field in both tables
which simply stores the concatenated value of a record's "fn" and "ln"
fields. This would require me to write a script to retro-fit the
existing 30k contact records, as well as update some logic in my "add
contact" controller. Not to mention this would also create a redundant
field in the table and still doesn't ultimately solve the problem
because what if in the future I want to find duplicates matching "fn",
"ln" and "mi" (middle initial)? I would have to create yet another
"dummy" field in my table.

> The beauty of doing the heavy lifting in SQL, even if you can't do it
> in the ORM, is that you only have to bring the matches across the
> wire. So instead of a minimum of 30,000 records coming out of the DB
> (and each getting messed around with Cake's ORM) you get only the
> dupes.

Oh I hear you on this one :) Once I realized I *couldn't* do it with
SQL I was quite bummed! Of course, maybe there's still a way...

- Kevin

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Re: Run query in background while showing "Please wait"

2008-03-26 Thread kdecapite

What I have is an ImportContact model which is simply a table
containing basic information about a person (name, email, address,
etc). This table gets populated either from a standard website
"Contact Us" form or by my client manually importing a CSV file. Now,
once this ImportContact model has new records in it, I want to display
a list of *potential* duplicates by various criteria. The "master"
contacts model is simply called Contact and contains 30,000+ records.
The Contact model does NOT include addresses, there is a separate
Address model for this (again, with 30,000+ records).

The reason I can't have the database do the searching for me is
because the criteria I want to search against to find potential
duplicates doesn't have a direct one-to-one correlation to the
database table fields.

For example this "duplicate search" criteria works great using Cake:

- Show potential duplicates based on field `email1`
- Cake will generate a SELECT...IN query, listing ALL email addresses
from the ImportContact model and returning all records from the
Contact model containing anyone of the listed emails

However, this example does NOT work:

- Show potential duplicates based on fields `fn` and `ln` (a person's
full name)
- Since I have to CONCAT(`fn`, `ln`) AS `full_name`, the SELECT...IN
syntax no longer works

So what I did was query ALL records from both ImportContact and
Contact models and store each of these result sets in their own
arrays. Then I perform the search using PHP array functions and store
the potential duplicates in a third array. This is working great, but
it's just on the slow side (sometimes 3+ minutes). It's not just the
array searching that's taking a while either, it's actually the
database query retrieving 30,000+ records that's slowing things down
as well.

I'm just giving a brief overview here and can provide specific code if
you like. Thanks for the quick replies, maybe I can get this thing to
run more efficiently but I'm not quite sure how :-/

- Kevin


On Mar 26, 4:14 pm, jonknee <[EMAIL PROTECTED]> wrote:
> > I have a CakePHP CRM application with 30,000+ contact records. Each
> > Contact hasMany Address records. One feature of my app is to allow my
> > client to search for duplicate entries within the contacts table
> > before importing new contacts. In short, I am querying the entire
> > Contact table and LEFT JOINing with the Address table. I store the
> > result set into an array and then use PHP's array functions to filter
> > potential duplicates from a "pending contacts" table.
>
> What exactly are you doing? It can probably be handled pretty quickly
> by the database. You shouldn't have to manually fish out dupes.
>
> I always thought the please wait while we search messages were either
> a [poor] attempt to seem extra high-tech or a sign that they have a
> really inefficient system. Google searches the whole web--not just a
> day's commercial flights--and comes back in a flash.
>
> I have used a PHP app with a valid reason for it, it was a mailing
> list program and it accomplished sending large amounts of email out by
> batching (the page would keep reloading with incrementing GET
> variables, such as start=0&num=100). It wouldn't work in your case.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---



Run query in background while showing "Please wait"

2008-03-26 Thread kdecapite

I have a CakePHP CRM application with 30,000+ contact records. Each
Contact hasMany Address records. One feature of my app is to allow my
client to search for duplicate entries within the contacts table
before importing new contacts. In short, I am querying the entire
Contact table and LEFT JOINing with the Address table. I store the
result set into an array and then use PHP's array functions to filter
potential duplicates from a "pending contacts" table.

What I want to do is somehow run this entire process (the database
query and array comparison) in the background (perhaps with a PHP
exec() call) and show some type of "Loading, please wait..." interface
to the application user. I'm thinking something similar to Expedia's
"Please wait while we search for your flight" interface.

The problem is, how would I do this? Any ideas or suggestions are
welcome! Thanks in advance...

- Kevin

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups "Cake 
PHP" group.
To post to this group, send email to cake-php@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/cake-php?hl=en
-~--~~~~--~~--~--~---