It might be of interest to some that in Sindice.com we switched from trying to index all in SPARQL to a mixed approach where all appears on the frontpage realtime but just selected Websites (rdf,rdfa, microformats, microdaa etc) + selected LOD datasets appear in a regularly updated (though not real time) appear in SPARQL.
This solution allows us to have a reasonable quality of service - while fitting in our limited research resources (as Sindice.com is a research project). By providing this service we intend to foster experimentation by the community that can now be sure that their favorite dataset is loaded (just send us a request) and can be queried e.g. in SPARQL next to their favorite web of data website (just make sure its in the list of those indexed or send us a request). Some details of this mechanism (and the fact that this made us process 100M rdf docs in a day) in this blog post. A UI making all more clear is coming in august. http://blog.sindice.com/2012/07/18/how-we-ingested-100m-semantic-documents-in-a-day-and-were-do-they-come-from/ Thanks must go to Openlink for the support provided in setting this mechanism up and to the others mentioned in the blog post. Gio