Re: [GNUnet-developers] Approximate Searches

leo stone Thu, 25 Jun 2009 01:05:58 -0700

There are two considerations.How many typos are likely and how is the local
filtering done.


If the local result filtering is not relaxed about typos of the sort "Woh"
than it would make no sense at all
to sort the consonants since non matching results would get filtered out
anyway.

If the local filtering can handle those typos it's still a question of COST
vs. GAIN, and this decision will be left to your guts.
One should consider though that most of the typos will probably happen
during search input rather
than when inserting a file. And I must say, if a program is smart enough to
handle my search typos
I am likely to be very pleased. You have a much better idea about the impact
on the net so I can't
really say anything about that.

regards leo

ps: I am wondering if you have an opinion about the matters that i am trying
to talk about in the forum.

On Wed, Jun 24, 2009 at 8:15 PM, Christian Grothoff
<[email protected]>wrote:

> I like this idea (at least as an option that should likely be the default)
> and
> have added it to the list of things to change for 0.9.x.  What I wonder if
> sorting the consonants should be omitted or not.  Some statistics on bad
> collisions with and without sorting would probably be nice to have...
>
> Christian
>
> On Tuesday 23 June 2009 07:27:17 leo stone wrote:
> > I believe the biggest factor on how we judge a system for future
> usability
> > is how many results we get if we are looking for "something" like
> > "something".
> > Imagine a shoe shop, with only two pair of shoes in it. And one with a
> few
> > hundreds.
> >
> > The result in the end might be the same you leave both shop's not finding
> > what you want, but most people will consider
> > the shop with a hundred pairs more promising and worth spending time next
> > time they try to find some shoes.
> >
> > So making sure people are getting results in their searches is probably
> one
> > of the more important issues, after
> > my doubts about how the routing is handled.
> >
> > Even though it might mean some significant overhead, i would consider
> doing
> > something like normalizing keywords.
> > If it must be, per language but in the beginning English should be
> enough.
> >
> > So if i wanted to share the following file, and i would like it public,
> so
> > people can find it, why not store it such:
> >
> > "Woh_the.fuck_is ALICe(2008).divx.avi.WMV"  =>  { HW , HT , CFK , S , CL
> ,
> > 2008 , DVX , V ,  MVW }
> >
> > Put the file under the hash's of those nine "key words".
> >
> > When i seach now for "fuck alice"  =>   { CFK , CL }
> >
> > search h(CFK)  AND h(CL)  will return a lot of wrong similar results but
> > them one can filter locally in a more elaborate way.
> >
> > It might even be more selective than search  h(video/x-msvideo)
> >
> > At least it returns results, whereas "Woh_the.fuck_is
> > ALICe(2008).divx.avi.WMV" as a key word is very unlikely that any one
> > would think to search for and therefore never be found, never be spread
> > ....., except by chance of course.
> >
> > regards leo
>
>

_______________________________________________
GNUnet-developers mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnunet-developers

Re: [GNUnet-developers] Approximate Searches

Reply via email to