Hello,
I'm currently in the process of evaluating solutions to index the contents of
~1TB of SEC (Securities and Exchange Commission) documents. File sizes vary
between a few KB to a couple hundred KB. I started evaluating Riak first
because ease of setting up and expanding a cluster are primary requirements
(ElasticSearch is also probably going to get evaluated, along with Solr).
Below I have a few specific questions that I was hoping people could help with:
* In going through the search querying documentation, I haven't found a
way to extract a section of a result containing matches. Something similar to
Google's search results page where you see an excerpt of the webpage contents
that match your query. Is something like this built-in so that it doesn't have
to be done by the application?
* Given that the documents total ~1TB of storage (not including the
generated indexes), does something like decreasing the n_val make sense?
Mostly the documents are bulk inserted on a daily or weekly basis – other than
that all of the operations are read-only.
Other than these specific questions, if anyone can provide general insight on
issues that would arise from a dataset like this within Riak, please feel free
to mention them.
Thanks,
--
Hector
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com