I can give you a few more data points. For one of my last projects, I built the search index of one of the largest IM aggregators. I got around 2.5k chat msg/s, keeping 400M messages in my index.
I looked at Solr and while it is very convenient/luxurious, there was no way in hell I could scale it this big. I ended up using Katta to serve the index with Hadoop to compute my index shards. While the whole system is batch oriented, I got my latency down to 2min (time for a doc to show up in the index), if I got less than 8k chat messages/s in. Katta handles replication and node failover (uses Zookeeper) and can be scaled easily by adding nodes & increasing the replication factor. In comparison to Solr, scale was not one of the things I had to worry. Like others have said, unless you provide a lot more specifics it will be hard to give you detailed recommendations. Hope this help! -Erich On Thu, Jan 7, 2010 at 11:31 PM, Richard Grossman <richie...@gmail.com> wrote: > First Thanks to all your answer it's help to really check all the aspects. > > In fact the system we want to build have to manage a lot of data but not in > an heavy transactional way. Solr can handle the data but doesn't have > the distributed way to serve it. But it's always possible to just duplicate > the data in my case. then we can load balancing the queries between multiple > instance server. > > We load a large set of data once a week and that all this data are going to > be used as his without modification or update or delete. In this point load > the data into Solr is very easy because we make a csv file and that's it > it's inside. > > The data need to be structured but not like a relational database. Obviously > Solr doesn't fit the data structure required. it force us to de-normalize a > lot of data and build like a very very big table it's force us also to build > very difficult lucene query. > > The speed to query for data is critical cause the application is internet > oriented we hope a lot of queries / minutes. With this point the problem is > that with the same amount of data Solr have been faster than cassandra but > of course the data structure is not the same. > > It seems by the end we'll go as Tatu tell to have an hybrid solution mixing > Solr and Cassandra. I'm not sure its the best in our case > Thanks