>From Rial Search documentation >Search queries use the same syntax as Lucene, and support most Lucene operators including term searches, field searches, >boolean operators, grouping, lexicographical range queries, and wildcards (at the end of a word only)
Besides, and this is something I'm looking right now, we would need some geographical queries. On Tue, Aug 9, 2011 at 3:20 AM, Paul O <[email protected]> wrote: > Pablo, is risk search able to do range queries? > > On Mon, Aug 8, 2011 at 5:17 PM, Pablo Chacin <[email protected]>wrote: > >> I'm facing a similar (yet not such extreme) use case. I'm also considering >> a similar strategy, but I was thinking about using riak search instead of a >> rdbs for the secondary indexes. >> >> On Mon, Aug 8, 2011 at 10:25 PM, Paul O <[email protected]> wrote: >> >>> Hi Jeremiah, >>> >>> This is for a yet-to-exist system, so the existing data characteristics >>> are not that important. >>> >>> The volume of data would be something like : average 10 events per second >>> per source meaning about 320 million events per source, for tens of >>> thousands of sources, potentially hundreds of thousands. >>> >>> Data retention policy would be in the range of years, probably 5 years. >>> >>> Most of the above-mentioned are averages, some sources might be sampled >>> even hundreds of times per second. There is also a layer of creating >>> aggregates for "regressive granularity" (a la RRD) but it's a bit less of a >>> concern (i.e. the same strategy I'm describing could be used for storing the >>> aggregates.) >>> >>> The strategy I've described tries to make the most common query (time >>> range per source with a max number of elements) predictable and as >>> performant as possible. I.e. for any range I know at most three batches need >>> to be read from Riak (or equivalent) so I can say that, if reading a batch >>> takes 20 ms and the initial query takes 10 ms I can predictably respond to >>> most such requests under 100 ms. >>> >>> So as long as I can benchmark individual aspects of the strategy I hope >>> to a predictable query cost and an idea of how to grow the system. >>> >>> As for the read to write ration I don't have an exact estimate (the >>> system will be generic and consumption applications will be built on top of >>> it) but the system is expected to be a lot more write intensive than read >>> intensive. Most data might go completely unused, some data might be rather >>> "hot" so additional caching might be implemented later but I'm trying to >>> design the underlying system so at least some performance axioms are >>> computable. >>> >>> Does this clarify or confuses further? >>> >>> Regards, >>> >>> Paul >>> >>> On Mon, Aug 8, 2011 at 3:32 PM, Jeremiah Peschka < >>> [email protected]> wrote: >>> >>>> It sounds like a potentially interesting use case. >>>> >>>> The questions that immediately enter my head are: >>>> * How much data do you currently have? >>>> * How much data do you plan to have? >>>> * Do you have a data retention policy? If so, what is it? How do you >>>> plan to implement it? >>>> * What's the anticipated rate of growth per day? Week? Year? >>>> * What type of queries will you have? Is it a fixed set of queries? Is >>>> it a decision support system? >>>> * What does your read to write ratio look like? >>>> >>>> Your plan to support Riak with a hybrid system isn't that out of whack; >>>> it's very doable. >>>> >>>> You can certainly do the type of querying you've described through >>>> careful choice of key names, sorting in memory, and only using the first N >>>> data points in a given Map Reduce query result. The main reason to not >>>> perform range queries in Riak is that they'll result in full key space >>>> scans >>>> across the Riak cluster. If you're using bitcask as your backend then it's >>>> an in memory scan, otherwise you're doing a much more costly scan from >>>> disk. >>>> And, since key names are hashed as they are partitioned across the cluster, >>>> you're not going to get the benefit of sequential disk scan performance >>>> like >>>> you might get with a traditional database. >>>> >>>> The only thing that worries me is the phrase "should grow more than what >>>> a 'vanilla' RDBMS would support". Are you thinking 1TB? 10TB? 50TB? 500TB? >>>> I'm trying to get a handle on what size and performance characteristics >>>> you're looking for before diving into how to look at your system vs. saying >>>> "Hell if I know, does someone else on the list have a good idea?" >>>> >>>> --- >>>> Jeremiah Peschka - Founder, Brent Ozar PLF, LLC >>>> Microsoft SQL Server MVP >>>> >>>> On Aug 8, 2011, at 11:21 AM, Paul O wrote: >>>> >>>> > Hello Riak enthusiasts, >>>> > >>>> > I am trying to design a solution for storing time series data coming >>>> from a very large number of potential high-frequency sources. >>>> > >>>> > I thought Riak could be of help, though based on what I read about it >>>> I can't use it without some other layer on top of it. >>>> > >>>> > The problem is I need to be able to do range queries over this data, >>>> by the source. Hence, I want to be able to say "give me the N first data >>>> points for source S between time T1 and time T2." >>>> > >>>> > I need to store this data for a rather long time, and the expected >>>> volume should grow more than what a "vanilla" RDBMS would support. >>>> > >>>> > Another thing to note is that I can restrict the number of data points >>>> to be returned by a query, so no query would return more than MaxN data >>>> points. >>>> > >>>> > I thought about doing this the following way: >>>> > >>>> > 1. bundle date time series in batches of MaxN, to ensure that any >>>> query would require reading at most two batches. The batches would be store >>>> inside Riak. >>>> > 2. Store the start-time, end-time, size and Riak batch ID in a MySQL >>>> (or PostgreSQL) DB. >>>> > >>>> > My thinking is such a strategy would allow me to persist data in Riak >>>> and linearly grow with the data, and the index would be kept in a RDBM for >>>> fast range queries. >>>> > >>>> > Does it sound sensible to use Riak this way? Does this make you >>>> laugh/cry/shake your head in disbelief? Am I overlooking something from >>>> Riak >>>> which would make all this much better? >>>> > >>>> > Thanks and best regards, >>>> > >>>> > Paul >>>> > _______________________________________________ >>>> > riak-users mailing list >>>> > [email protected] >>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>> >>>> >>> >>> _______________________________________________ >>> riak-users mailing list >>> [email protected] >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >>> >> >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
