I can give you a few more data points. For one of my last projects, I
built the search index of one of the largest IM aggregators. I got
around 2.5k chat msg/s, keeping 400M messages in my index.

I looked at Solr and while it is very convenient/luxurious, there was
no way in hell I could scale it this big. I ended up using Katta to
serve the index with Hadoop to compute my index shards.

While the whole system is batch oriented, I got my latency down to
2min (time for a doc to show up in the index), if I got less than 8k
chat messages/s in.

Katta handles replication and node failover (uses Zookeeper) and can
be scaled easily by adding nodes & increasing the replication factor.
In comparison to Solr, scale was not one of the things I had to worry.

Like others have said, unless you provide a lot more specifics it will
be hard to give you detailed recommendations.

Hope this help!
-Erich

On Thu, Jan 7, 2010 at 11:31 PM, Richard Grossman <richie...@gmail.com> wrote:
> First Thanks to all your answer it's help to really check  all the aspects.
>
> In fact the system we want to build have to manage a lot of data but not in
> an heavy transactional way. Solr can handle the data but doesn't have
> the distributed way to serve it. But it's always possible to just duplicate
> the data in my case. then we can load balancing the queries between multiple
> instance server.
>
> We load a large set of data once a week and that all this data are going to
> be used as his without modification or update or delete. In this point load
> the data into Solr is very easy because we make a csv file and that's it
> it's inside.
>
> The data need to be structured but not like a relational database. Obviously
> Solr doesn't fit the data structure required. it force us to de-normalize a
> lot of data and build like a very very big table it's force us also to build
> very difficult lucene query.
>
> The speed to query for data is critical cause the application is internet
> oriented we hope a lot of queries / minutes. With this point the problem is
> that with the same amount of data Solr have been faster than cassandra but
> of course the data structure is not the same.
>
> It seems by the end we'll go as Tatu tell to have an hybrid solution mixing
> Solr and Cassandra. I'm not sure its the best in our case
> Thanks

Reply via email to