I was at a search vendor round table today...

2010-09-22 Thread Smiley, David W.
(I don't twitter or blog so I thought I'd send this message here)

Today at work (at MITRE outside DC) there was (is) a day of technical 
presentations about topics related to information dissemination and discovery 
(broad squishy words there, but mostly covered search) at which I spoke about 
the value of faceting, and gave a quick Solr pitch.  There was an hour vendor 
panel in which a representative from Autonomy, Microsoft (i.e. FAST), Google, 
Vivisimo, and Endeca had the opportunity to espouse the virtues of their 
product, and fit in an occasional jab at their competitors next to them.  In 
the absence of a suitable representative for Solr (e.g. Lucid) I pointed out 
how open-source Solr has democratized (i.e. made free) search and faceting 
when it used to require paying lots of money.  And I asked them how their 
products have reacted to this new reality.  Autonomy acknowledged they used to 
make millions on simple engagements in the distant past but that isn't the case 
these days.  He said some other things about a huge petabyte hosted search 
collection they have used by banks... I forget what else he said.  I forgot 
what Google said.  Vivisimo quoted Steve Ballmer, saying open source is as 
free as a free puppy (not a bad point IMO).  Endeca claimed to be happy Solr 
exists because it raises the awareness of faceted search, but then claimed it 
would not scale and they should then upgrade to Endeca.  (!)  I found that 
claim ridiculous, of course.

Speaking of performance, on a large scale search project where we're using Solr 
in place of a MarkLogic prototype (because ML is so friggin expensive, for one 
reason), the search results were so fast (~150ms) vs. the ML's results of 2-3 
seconds, that the UI engineers building the interface on top of the XML output 
thought Solr was broken because it was so fast.  The quote was It's so fast, 
it's broken.In other words, they were used to 2-3 second response times 
and so if the results came back as fast as what Solr has been doing, then 
surely there's a bug.  There's no bug.  :)  Admittedly, I think it was a bit of 
an apples and oranges comparison but I love that quote nonetheless.

~ David Smiley
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book




Re: I was at a search vendor round table today...

2010-09-22 Thread Grant Ingersoll

On Sep 22, 2010, at 2:04 PM, Smiley, David W. wrote:

 (I don't twitter or blog so I thought I'd send this message here)
 
 Today at work (at MITRE outside DC) there was (is) a day of technical 
 presentations about topics related to information dissemination and discovery 
 (broad squishy words there, but mostly covered search) at which I spoke 
 about the value of faceting, and gave a quick Solr pitch.  There was an hour 
 vendor panel in which a representative from Autonomy, Microsoft (i.e. FAST), 
 Google, Vivisimo, and Endeca had the opportunity to espouse the virtues of 
 their product, and fit in an occasional jab at their competitors next to 
 them.  In the absence of a suitable representative for Solr (e.g. Lucid) I 
 pointed out how open-source Solr has democratized (i.e. made free) search 
 and faceting when it used to require paying lots of money.  And I asked them 
 how their products have reacted to this new reality.  Autonomy acknowledged 
 they used to make millions on simple engagements in the distant past but that 
 isn't the case these days.  He said some other things about a huge petabyte 
 hosted search collection they have used by banks... I forget what else he 
 said.  I forgot what Google said.  Vivisimo quoted Steve Ballmer, saying 
 open source is as free as a free puppy (not a bad point IMO).  

Too funny.  Hadn't heard that one before.  Presumably meaning you have to care 
and feed it, despite the fact that you really do love it and it is cute as 
hell?  The care and feeding is true of the commercial ones, too, especially in 
terms of  for supporting features you never use, but love (as in we love 
using this tool) is usually not a word I hear associated in those respects too 
often, but of course that is likely self selecting.  

 Endeca claimed to be happy Solr exists because it raises the awareness of 
 faceted search, but then claimed it would not scale and they should then 
 upgrade to Endeca.  (!)  I found that claim ridiculous, of course.

Having replaced all the above on a number of occasions w/ Solr at both a 
significant cost savings on licensing, dev time, and hardware, I would agree 
that claim is quite ridiculous.  Besides, in my experience, the scale claim is 
silly.  Everyone (customers) says they need scale, but few of them really know 
what scale is, so it is all relative.   For some, scale is 1M docs, for others 
it's 1B+ docs;  for others it's 100K queries per day, for others it's 100M per 
day.  (BTW, I've seen Lucene/Solr do both, just fine.  Not that it is a free 
lunch, but neither are the other ones despite what they say.)

 
 Speaking of performance, on a large scale search project where we're using 
 Solr in place of a MarkLogic prototype (because ML is so friggin expensive, 
 for one reason), the search results were so fast (~150ms) vs. the ML's 
 results of 2-3 seconds, that the UI engineers building the interface on top 
 of the XML output thought Solr was broken because it was so fast.  The quote 
 was It's so fast, it's broken.In other words, they were used to 2-3 
 second response times and so if the results came back as fast as what Solr 
 has been doing, then surely there's a bug.  There's no bug.  :)  Admittedly, 
 I think it was a bit of an apples and oranges comparison but I love that 
 quote nonetheless.


I love it.  I have had the same experience where people think it's broken b/c 
it's so fast.  Large vendor named above took 24 hours to index 4M records (they 
weren't even doing anything fancy on the indexing side) and search was slow 
too.  Solr took about 40 minutes to index all the content and search was 
blazing.  Same content, faster indexing, better search results, a lot less 
time. 

At any rate, enough of tooting our own horn.  Thanks for sharing!

-Grant


--
Grant Ingersoll
http://www.lucidimagination.com/



Re: I was at a search vendor round table today...

2010-09-22 Thread Walter Underwood
On Sep 22, 2010, at 11:04 AM, Smiley, David W. wrote:

 Speaking of performance, on a large scale search project where we're using 
 Solr in place of a MarkLogic prototype (because ML is so friggin expensive, 
 for one reason), the search results were so fast (~150ms) vs. the ML's 
 results of 2-3 seconds, that the UI engineers building the interface on top 
 of the XML output thought Solr was broken because it was so fast.  The quote 
 was It's so fast, it's broken.In other words, they were used to 2-3 
 second response times and so if the results came back as fast as what Solr 
 has been doing, then surely there's a bug.  There's no bug.  :) Admittedly, I 
 think it was a bit of an apples and oranges comparison but I love that quote 
 nonetheless.

I implemented Solr at Netflix and now I work at MarkLogic, and I strongly agree 
that the comparison is apples and oranges. MarkLogic does run very fast on very 
large datasets, so maybe that prototype was built to show functionality instead 
of speed. Also, MarkLogic already has a lot of stuff that is still in the 
future for Solr, like true real-time search, updating fields, and geospatial 
search.

Next time, invite the MarkLogic people, too. :-)

wunder
--
Walter Underwood
Lead Engineer
MarkLogic



Re: I was at a search vendor round table today...

2010-09-22 Thread Alexander Kanarsky
  He said some other things about a huge petabyte hosted search collection 
 they have used by banks..

In context of your discussion this reference sounds really, really funny... :)

-Alexander

On Wed, Sep 22, 2010 at 1:17 PM, Grant Ingersoll gsing...@apache.org wrote:

 On Sep 22, 2010, at 2:04 PM, Smiley, David W. wrote:

 (I don't twitter or blog so I thought I'd send this message here)

 Today at work (at MITRE outside DC) there was (is) a day of technical 
 presentations about topics related to information dissemination and 
 discovery (broad squishy words there, but mostly covered search) at which 
 I spoke about the value of faceting, and gave a quick Solr pitch.  There was 
 an hour vendor panel in which a representative from Autonomy, Microsoft 
 (i.e. FAST), Google, Vivisimo, and Endeca had the opportunity to espouse the 
 virtues of their product, and fit in an occasional jab at their competitors 
 next to them.  In the absence of a suitable representative for Solr (e.g. 
 Lucid) I pointed out how open-source Solr has democratized (i.e. made 
 free) search and faceting when it used to require paying lots of money.  And 
 I asked them how their products have reacted to this new reality.  Autonomy 
 acknowledged they used to make millions on simple engagements in the distant 
 past but that isn't the case these days.  He said some other things about a 
 huge petabyte hosted search collection they have used by banks... I forget 
 what else he said.  I forgot what Google said.  Vivisimo quoted Steve 
 Ballmer, saying open source is as free as a free puppy (not a bad point 
 IMO).

 Too funny.  Hadn't heard that one before.  Presumably meaning you have to 
 care and feed it, despite the fact that you really do love it and it is cute 
 as hell?  The care and feeding is true of the commercial ones, too, 
 especially in terms of  for supporting features you never use, but love 
 (as in we love using this tool) is usually not a word I hear associated in 
 those respects too often, but of course that is likely self selecting.

 Endeca claimed to be happy Solr exists because it raises the awareness of 
 faceted search, but then claimed it would not scale and they should then 
 upgrade to Endeca.  (!)  I found that claim ridiculous, of course.

 Having replaced all the above on a number of occasions w/ Solr at both a 
 significant cost savings on licensing, dev time, and hardware, I would agree 
 that claim is quite ridiculous.  Besides, in my experience, the scale claim 
 is silly.  Everyone (customers) says they need scale, but few of them really 
 know what scale is, so it is all relative.   For some, scale is 1M docs, for 
 others it's 1B+ docs;  for others it's 100K queries per day, for others it's 
 100M per day.  (BTW, I've seen Lucene/Solr do both, just fine.  Not that it 
 is a free lunch, but neither are the other ones despite what they say.)


 Speaking of performance, on a large scale search project where we're using 
 Solr in place of a MarkLogic prototype (because ML is so friggin expensive, 
 for one reason), the search results were so fast (~150ms) vs. the ML's 
 results of 2-3 seconds, that the UI engineers building the interface on top 
 of the XML output thought Solr was broken because it was so fast.  The quote 
 was It's so fast, it's broken.    In other words, they were used to 2-3 
 second response times and so if the results came back as fast as what Solr 
 has been doing, then surely there's a bug.  There's no bug.  :)  Admittedly, 
 I think it was a bit of an apples and oranges comparison but I love that 
 quote nonetheless.


 I love it.  I have had the same experience where people think it's broken b/c 
 it's so fast.  Large vendor named above took 24 hours to index 4M records 
 (they weren't even doing anything fancy on the indexing side) and search was 
 slow too.  Solr took about 40 minutes to index all the content and search was 
 blazing.  Same content, faster indexing, better search results, a lot less 
 time.

 At any rate, enough of tooting our own horn.  Thanks for sharing!

 -Grant


 --
 Grant Ingersoll
 http://www.lucidimagination.com/