Re: Call for Resources

Won Lee Mon, 21 Jun 2004 08:39:16 -0700

So your algorithm places too much weight on previous attendance?

On Fri, 18 Jun 2004 13:03:36 -0700 (PDT), Michael Haggerty
<[EMAIL PROTECTED]> wrote:
>
> Everybody -
>
> I am in need of some serious resources to help me with a problem. Well, not a problem the way like drugs are a problem in America, but a problem like building an algorhythm. Don't worry, no math skills required to help.
>
> I have built a text-mining app running on top of MySQL as well as a lightweight OLAP interface to go along with it. The text mining system is pretty sweet, it builds contextual subject trees of content from blogs for use in decision support. These trees exist as metadata and can be queried against using simple SQL. The OLAP system works exactly the way I want it to, giving me the ability to drill down through multiple dimensions of data based on predefined filters (it's kind of like building dynamic queries).
>
> Here's my statement of the problem: how do you blend the results from these 2 subsystems into something meaningful? I have come up with a complex scoring mechanism based on a number of variables (discussed below), but I am concerned about the direction the project is going because I have no examples of anyone else's work to compare it to.
>
> Here's an example of what I am trying to do: someone is planning a meeting, and it has to do with environmental issues. In the database, there are the following sets of data:
>
> - Geocoded contact accounts
> - Geocoded attendance lists for earlier events
> - Frequency of Web Vists per User
> - Communications associated with user accounts
> - Other stuff I cannot go into
>
> I want to know who is most likely to attend the meeting, based on where they live and what their interests are. So I look at the data and score each user, giving weight to various factors such as proximity as a factor for attendance, frequency of attendance, amounts of donations, attendance trends for that time of year, etc. The system first returns a screen showing the number of users who meet a certain threshold score for their probablility of attendance (along with some other magic numbers), then allows me to drill down to get complete lists of who is likely to be there based on my user accounts.
>
> Right now, I have a formula that does this but it is a little pessimistic. It will ignore someone who lives right next door to an event if they are a new contact no matter how much interest they have in the environment unless I send them an invitation first. I am building in some tolerance for ambiguity (in other words, teaching it how to ignore certain factors in some cases), but this is massively slowing down the queries and putting me in the position where I need to think about aggregating the data in other tables just to keep going. Of course, if I do this, it makes my algorhythm dependent on the new database structure, and I do not want to get into a situation where I build and build and build then rewrite rewrite rewrite ad nauseum. I would like to see how someone has done something similar and see if that sparks some brilliance on my part.
>
> Anyways, does anyone have any links / books / mailing lists / neighbors / anything at all they can point me at on this one? I know someone else had to have done the same thing once before, I just wish I really knew what to call it.
>
> M
>
>

[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings] [Donations and Support]

Re: Call for Resources

Reply via email to