Re: [riak-user]Cannot startup riak node correctly after successful installation
YouBarco writes: Hello, My OS is ubuntu 14.04 64bit, and installed erlang from source with version R16B as following: That's your problem. You MUST use the custom Basho branch of Erlang/OTP with Riak. If you insist on building Erlang/Riak from source then follow this guide for Erlang: http://docs.basho.com/riak/latest/ops/building/installing/erlang/ bad scheduling option -sfwi This flag was added by Basho to the 16B series. IIRC, vanilla 16B02 includes this flag but you still should use Basho's custom branch since it has other fixes required by Riak. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: risk attach not working properly after operation
Oliver Soell writes: That was all well and good, but now “riak attach” isn’t giving me the love I thought I should get: (c_1494_riak@172.29.18.183)1 {ok, Ring} = riak_core_ring_manager:get_my_ring(). ** exception error: no match of right hand side value {error,no_ring} (c_1494_riak@172.29.18.183)2 This is because attach is a remote shell (a change introduced in 2.0.0 I think). Either you need to use an rpc call or you can used riak `attach-direct`. If you do the later remember to use Ctrl-D to exit, not Ctrl-C. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna - inconsistent number of documents found for the same query
Eric Redmond writes: This is a known issue, and we're still working on a fix. https://github.com/basho/yokozuna/issues/426 I don't see how this issue is related to Oleksiy's problem. There is no mention of removing or adding nodes. I think the key part of Oleksiy's report is the association of an index _after_ data had already been written. That data is sometimes missing. These two issues could be related but I don't see anything in that GitHub report to indicate why. On Nov 29, 2014, at 9:26 AM, Oleksiy Krivoshey oleks...@gmail.com wrote: 1. Create a bucket, insert some keys (10 keys - KeysA) 2. Create Yokozuna Index, associate it with the bucket 3. Add or update some new keys in the bucket (10 keys - KeysB) 4. Wait for Search AAE to build and exchange the trees Now when I issue a search query I will always get all 10 KeysB but a random amount of KeysA, for example the same query repeated 5 times may return: 10 KeysB + 2 KeysA 10 KeysB + 0 KeysA 10 KeysB + 7 KeysA 10 KeysB + 1 KeysA 10 KeysB + 10 KeysA Are there any errors in the logs? Does the count go up if you wait longer? What does `riak-admin search aae-status` show? -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna error during indexing
Oleksiy Krivoshey writes: Hi, I have enabled Yokozuna on existing Riak 2.0 buckets and while it is still indexing everything I've already received about 50 errors like this: emulator Error in process 0.26807.79 on node 'riak@10.0.1.1' with exit value: {{badmatch,false},[{base64,decode_binary,2,[{file,base64.erl},{line,211}]},{yz_solr,to_pair,1,[{file,src/yz_solr.erl},{line,414}]},{yz_solr,'-get_pairs/1-lc$^0/1-0-',1,[{file,src/yz_solr.erl},{line,411}]},{yz_solr,'-get_pairs/1-lc$^0/1-0-'... Can someone please describe what does it mean? I'm fairly certain the base64 library in Erlang is indicating that you have a truncated base64 string. https://github.com/basho/otp/blob/OTP_R16B02_basho6/lib/stdlib/src/base64.erl#L211 You should be able to attach to the Riak console and run the following command to get the base64 string: redbug:start(yz_solr:to_pair - return). That will give you the Type/Bucket/Key and the base64 string that is causing the issue. Knowing that info can help you confirm the issue and perhaps figure out what it is happening. Are you using a custom schema? -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna error during indexing
Oleksiy Krivoshey writes: Yes, I'm using custom schema and custom bucket type. There are many (over 500) buckets of this type. Did you modify any of the _yz_* fields? Your command have returned the following tuple: (riak@10.0.1.1)1 redbug:start(yz_solr:to_pair - return). {1919,1} redbug done, timeout - 0 How do I get bucket/key/base64_string from this? Yea, this needs to be running when you happen to come across a bad value. Try a higher timeout and hope you get lucky. redbug:start(yz_solr:to_pair - return, [{time, 60}]). That should let it run for 10 minutes. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna error during indexing
Oleksiy Krivoshey writes: Still, what kind of base64 string can it be? I don't have anything base64 encoded in my data, its a pure JSON objects stored with content_type: 'application/json' The _yz_* fields (which need to be part of your schema and defined exactly as defined in the default schema) are generated as part of indexing. The entropy data field (_yz_ed) uses a base64 of the object hash so that hashtrees may be rebuilt for the purpose of Active Anti-Entropy (AAE). My guess is somehow this value if getting truncated or corrupted along the way. https://github.com/basho/yokozuna/blob/develop/priv/default_schema.xml#L111 This code is only executed when rebuilding AAE trees. What is the output from the following? riak-admin search aae-status -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna error during indexing
Oleksiy Krivoshey writes: Entropy Trees Index Built (ago) --- 11417981541647679048466287755595961091061972992-- 57089907708238395242331438777979805455309864960-- 102761833874829111436196589800363649819557756928 -- 148433760041419827630061740822747494183805648896 -- 194105686208010543823926891845131338548053540864 10.4 hr 239777612374601260017792042867515182912301432832 -- 285449538541191976211657193889899027276549324800 -- 650824947873917705762578402068969782190532460544 -- 696496874040508421956443553091353626554780352512 -- 742168800207099138150308704113737470919028244480 -- 787840726373689854344173855136121315283276136448 -- 833512652540280570538039006158505159647524028416 -- 879184578706871286731904157180889004011771920384 -- 924856504873462002925769308203272848376019812352 -- 970528431040052719119634459225656692740267704320 -- 1016200357206643435313499610248040537104515596288 -- 1061872283373234151507364761270424381468763488256 9.4 hr 1107544209539824867701229912292808225833011380224 -- 1153216135706415583895095063315192070197259272192 -- 119061873006300088960214337575914561507164160 12.1 hr 1244559988039597016282825365359959758925755056128 -- 1290231914206187732476690516382343603290002948096 -- 1335903840372778448670555667404727447654250840064 11.4 hr 1381575766539369164864420818427111292018498732032 -- 1427247692705959881058285969449495136382746624000 -- So it seems many of these trees are not building because of this issue. The system will keep trying to build but it will fail every time because of the bad base64 string. Trying to catch this with redbug will prove difficult too because it automatically shuts itself off after X events. That can be changed but then you have to dig through a mountain of output. Not a fun way to do things. How comfortable are you with Erlang/Riak? Enough to write a bit of code and hot-load it into your cluster? -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna error during indexing
Oleksiy Krivoshey writes: Got few results: I don't see anything wrong with the first record, but the second record mentions the key '/.Trash/MT03' which is not correct, the correct key that exists in that bucket is '/.Trash/MT03 348 plat frames' You have found a bug in Yokozuna. https://github.com/basho/yokozuna/blob/develop/src/yz_doc.erl#L230 https://github.com/basho/yokozuna/blob/develop/java_src/com/basho/yokozuna/handler/EntropyData.java#L139 It foolishly assumes there is no space character used in the type, bucket, or key names. As a workaround I think you'll have to make sure your application converts all spaces to some other character (like underscore) before storing in Riak. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Solution for Riak 500 Internal Server Error
On Oct 14, 2014, at 3:53 AM, ayush mishra ayushmishra2...@gmail.com wrote: http://www.dzone.com/links/r/solution_for_riak_500_internal_server_error.html ___ I recommend _not_ using legacy Riak Search on Riak 2.x. Why was the legacy search pre-commit hook installed in the first place? Are you trying to use search? Documentation for new search: http://docs.basho.com/riak/latest/dev/using/search/ -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna Scale
On Sep 18, 2014, at 4:20 AM, anandm an...@zerebral.co.in wrote: Yes - looks like its going that way too - a decoupling between the Solr Cloud and the Riak Cluster seems like a solution that could work out and then Yokozuna to index content out to the Solr Cloud (completely external to riak - Solr not made to baby sit in Riak) - In this arrangement we could maintain index distribution with Solr in a Sharded env (over implicit or composite id collections) and riak used just as a kv. Yokozuna could also be used as a front to Solr Cloud - and it will search on Solr and fetch the matching docs from Riak and returned the merged doc back to the client. Hi, creator of Yokozuna here. I just want to make it clear that Yokozuna does not use SolrCloud. It uses regular old Solr and Riak does the sharding replication. Yokozuna uses Solr’s Distributed Search which is _not_ SolrCloud. It uses Riak Core coverage to build the query plan and feeds that into Solr’s Distributed Search. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna Scale
On Sep 18, 2014, at 10:35 AM, anandm an...@zerebral.co.in wrote: Yes Ryan - that aspect is pretty clear. So with 1-1 riak-solr Yokozuna deployment scale to my requirement? Am I missing something here when thinking it wouldn't? I haven’t followed this thread closely, it was just your last email that caught my eye. The only way you'll know if it scales is if you try. The one thing I might worry about is the heap usage. I'm not sure if Yokozuna will allow it but you might try tweaking the the schema so that the `_yz_*` fields use on-disk DocValues. IIRC, this was a change I was thinking of making to reduce the heap pressure (at the potential cost of extra I/O? Honestly it's been months since I've thought hard about this stuff. There is one person that I know of who has pushed Yokozuna a fair bit and that is Wes Brown. Perhaps you can track him down and get some hard-won answers: http://basho.com/rubicon-io-uses-riak-to-provide-real-time-threat-analysis/ https://github.com/basho/yokozuna/issues?q=is%3Aissue+author%3Awbrown -Z___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Optimistic Locking with Riak Search / is _version_ required in schema.xml?
On Aug 7, 2014, at 5:46 PM, David James davidcja...@gmail.com wrote: Is _version_ required? It should not be required as the documentation says it is only needed for real-time GET which Riak Search (Yokozuna) disables since Riak KV provides the get/put implementation. I see SolrCloud mentioned in some documentation (see below)? Does Riak Search use it? RS does not make use of SolrCloud at all. It uses Solr’s Distributed Search but that is something that existed well before SolrCloud. All routing and replica administration is handled by Riak. Each Solr instance (one per node) has no awareness of the other nodes except for the explicit distributed queries sent by Riak. How does Riak Search handle optimistic locking? It doesn’t use Solr’s optimistic locking at all. All key-value semantics come from Riak itself. RS simply indexes an object’s values. See this comment on the default_schema.xml on Github: !-- TODO: is this needed? -- field name=_version_ type=long indexed=true stored=true/ https://raw.githubusercontent.com/basho/yokozuna/develop/priv/default_schema.xml Yes, I wrote that TODO. It is one of many that founds its way into 2.0.0 :). You should run fine without this field if you create a custom schema. P.S. Per https://wiki.apache.org/solr/SchemaXml _version_ Solr4.0 - This field is used for optimistic locking in SolrCloud and it enables Real Time Get. If you remove it you must also remove the transaction logging from solrconfig.xml, see Real Time Get. Just to reiterate what I said above, RS disables the transaction logging and thus there is no real time get. There is no reason for it since that is what Riak itself provides. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search 2.0 Questions
On Jul 24, 2014, at 12:29 PM, Andrew Zeneski and...@andrewzeneski.com wrote: Been checking out 2.0rc1 and am really excited about the new features (as I think most of us are). I had a couple of questions that I couldn't find answers to scanning the docs. Totally possible I missed it and if so, please feel free to direct me to the proper place. 1. Is there a way to remove a search index and schema? No, currently you can only store/update a schema. 2. Do indexes just reference schemas? An index has an associated schema. When the index is created locally on a node it retrieves that schema from an internal store built into Riak and writes it to the directory specific to that index. The index uses the schema stored in its local directory. Updates to the schema are not automatically propagated to the local file. More specifically, if I update a schema will those changes propagate to all indexes using that schema? No, you will need to either need to delete the index and recreate or attach to the Riak console and run the following command: rp(yz_index:reload(index_name)). This command will fetch the latest version of the associated schema for the index “index_name”, overwrite the index’s local schema, and then reload that index across the entire cluster. Why is this all so awkward? Some of the gory details can be found in these two issues if you really want to know: https://github.com/basho/yokozuna/issues/130 https://github.com/basho/yokozuna/issues/403 The reason I ask is I've been experimenting with simple searching and found in the logs an error indexing a document due to unknown fields. I realized I missed the catch all dynamic field in my schema and updated it. After updating I ran the test again (after deleting any existing data) but the error persists. Leading me to believe that the schema isn't updating. But when I view the schema $RIAK_HOST/search/schema/testschema I see the updates. Yes, the schema itself has been updated but, as explained above, it is once removed from the index and not automatically reloaded. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna search
On Jul 23, 2014, at 10:02 AM, Sean Cribbs s...@basho.com wrote: . In this case, no, you cannot use wildcards at the beginning [1]. [1] http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Wildcard Searches Actually, you can place the wildcards * or ? anywhere, it doesn’t matter. When placing it at the start it just means the entire term index will have to be searched to determine if the term exists. A common trick veteran Lucene/Solr users will use is to index all terms both forward and backwards, that way you can turn a postfix query (e.g. *ly) into a prefix query (e.g. yl*) [1]. On Wed, Jul 23, 2014 at 4:22 AM, Alexander Popov mogada...@gmail.com wrote: Will queries support masks at beging and 1 char mask like *lala and a* Yes, it absolutely will. As Sean said, Yokozuna uses Solr and therefore gives all the same functionality so long as that query type is supported by Solr’s distributed search (and the most important stuff is [2]). Yokozuna uses Solr 4.7.0, the Solr Reference Guide is a great place to learn more about Solr [3]. -Z [1]: http://stackoverflow.com/questions/8515190/solr-reverse-wildcard-field-association [2]: https://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations [3]: https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.7.pdf___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Endless AAE keys repairing
On Jul 17, 2014, at 4:30 AM, Daniil Churikov ddo...@gmail.com wrote: Hello, In our test env we have 3 nodes riak 1.4.8-1 cluster on debians. According to logs: 2014-07-17 02:48:03.748 [info] 0.10542.85@riak_kv_exchange_fsm:key_exchange:206 Repaired 1 keys during active anti-entropy exchange of {936274486415109681974235595958868809467081785344,3} between {936274486415109681974235595958868809467081785344,'riak@10.3.13.96'} and {981946412581700398168100746981252653831329677312,'riak@10.3.13.96'} Messages like this constantly appears, there is not so much load on this test cluster and I expected that eventually everything will be fixed, but this messages keep coming from day to day. In the past we had several issues with one of the cluster participants and as a result we did enabled AAE to fix it. What could be possble the reason of this? This is probably caused by regular puts. When AAE performs an exchange it takes snapshots of each tree in a concurrent manner. This means that a snapshot could occur while replicas for a given object are still in flight. For example: 1. User writes object O. 2. Coordinator sends O to 3 partitions A, B, and C. 3. Partition A accepts O and updates hash tree. 4. Entropy manager on node which own partition A decides to perform an exchange between A and B. 5. Snapshot is taken of hash tree for A. 6. Snapshot is taken of hash tree for B. 7. Partition B accepts O and updates hash tree (but the update is not reflected in the snapshot just taken) 8. Partition C accepts O and updates hash tree. 9. Exchange between A B determines object is missing on B and performs a read repair. 10. Read repair notices that object O exists on all three partitions and there is nothing to be done. The higher the load the more keys that could be included in one snapshot but not the other. I would say that any time your cluster is accepting writes it might be normal to see a handful of keys getting “repaired”. But if you see, say, more than 10 (especially if there are 0 outstanding writes) then that is probably a sign of real repair. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: AAE problems
On Tue, Jun 17, 2014 at 12:46 PM, István lecc...@gmail.com wrote: The entire dataset is idempotent and immutable, so there is not even a slightest chance that we are ending up with different values on different nodes for the same key in the same bucket. It seems that anti-entropy still finds problems: /var/log/riak/console.log.4:2014-06-11 06:11:41.756 [info] 0.6776.6003@riak_kv_exchange_fsm:key_exchange:206 Repaired 1 keys during active anti-entropy exchange of {536645132457440915277915524513010171279912730624,3} between {548063113999088594326381812268606132370974703616,'riak@10.1.11.120'} and {559481095540736273374848100024202093462036676608,'riak@10.1.11.121'} AAE exchange uses snapshots of the trees. The snapshots on each node will happen concurrently. If your cluster is servicing writes as these snapshots are made then there is a chance a snapshot will be made on one node containing keys X,Y,Z and on the other node which has only seen keys X Y. My question would be: Is there any reason to let AAE running if we don't mutate the data in place? YES. Immutable data provides nice semantics for your application but does _nothing_ to save you from the whims of the stack your application runs on. Operating systems, file systems, and hardware all have subtle ways to corrupt your data both on disk and in memory. Immutable data also doesn't help in the more practical case where the network decides to drop packets and a write only makes it to some of the nodes. Is there any way knowing what is causing the difference according to AAE between two nodes? There is but it requires attaching to Riak and running some diagnostic commands _when_ a repair takes place. I'm not sure it will give you any insight though. It will either say: 1) remote missing, 2) local missing or 3) hashes are different. I was thinking about how this could potentially happen and I am wondering if the Java client pb interface supports R and W values, so I could make sure that a write goes in with W=(the number of nodes we have). Doubt this will help with the concurrency problem I discussed above but it will mean your application has a stronger guarantee of how many copies made it to the nodes. If you want to make sure they are durable then I would use DW if Java exposes it [1]. [1]: See the optional query parameters for difference between W, DW, and PW. http://docs.basho.com/riak/latest/dev/references/http/store-object/#Request -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Search sort error in 2.0.0beta1
This seems like a bug to me. I created an issue to track it. https://github.com/basho/yokozuna/issues/372 On Thu, May 1, 2014 at 5:22 PM, Troy Melhase t...@troy.io wrote: Hello again! I've narrowed this down to the interaction between the sort parameter and the field list (fl) parameter. It seems that if fl is supplied with sort, the field list must contain the value score. I'm not certain if that's a bug or not, but the work-around is plain enough: add score to the list of fields if there's a sort and the field list isn't empty. Whew! troy On Wed, Apr 30, 2014 at 10:09 PM, Troy Melhase t...@troy.io wrote: Hello! I'm getting an error when I include a sort parameter in a RpbSearchQueryReq message. I'm using Riak 2.0.0beta1. Source build and macos binaries show the same behavior. The error doesn't happen at all if I don't specify a search parameter. For the parameter value, I'm using field direction (e.g., name asc). Leaving off the direction, or encoding the space as + or %20 produces a Solr error. I've tried Golang and Python clients to see if it was a client issue. Both clients produce the exact same error; that error text is at the end of this message. Is this a known bug? I searched Github and couldn't find any issues that look like this one. Is there a work-around? Or better yet, am I doing something wrong? Thanks! troy Error text: RiakError: 'Error processing incoming message: error:badarg:[{protobuffs,encode_internal, [2,[],float], [{file,src/protobuffs.erl}, {line,167}]}, {riak_search_pb,iolist,2, [{file, src/riak_search_pb.erl}, {line,63}]}, {riak_search_pb,encode,2, [{file, src/riak_search_pb.erl}, {line,48}]}, {riak_pb_codec,encode,1, [{file, src/riak_pb_codec.erl}, {line,77}]}, {yz_pb_search,encode,1, [{file, src/yz_pb_search.erl}, {line,60}]}, {riak_api_pb_server, send_encoded_message_or_error, 3, [{file, src/riak_api_pb_server.erl}, {line,498}]}, {riak_api_pb_server, process_message,4, [{file, src/riak_api_pb_server.erl}, {line,430}]}, {riak_api_pb_server, connected,2, [{file, src/riak_api_pb_server.erl}, {line,262}]}]' ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: how to set eDisMax? Solr start and rows not working properly
On Sun, Apr 20, 2014 at 9:57 AM, Buri Arslon buri...@gmail.com wrote: Hi guys! I searched the docs and the source code but wasn't able to find any info about using edismax. I have 2 questions: 1. How to set edismax parser? You can make use of LocalParams syntax in order to use different query parsers. For example: {!edismax}my query http://wiki.apache.org/solr/QueryParser http://wiki.apache.org/solr/LocalParams 2. Why are start and rows not working properly? I'll get back to your sort question when I have a chance to verify on my side. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak 2.0 search changes
Alexander, 1. Does it support wildcards in a middle or start? *abc, a*bc, Riak Search 2.0 (Yokozuna) is based on Apache Solr. Any queries supported by Solr's distributed search are supported by Search 2.0 over HTTP. The PB API has not been altered for Search 2.0 (with the exception of presort) so if you want to use features like facets you'll have to use HTTP for now. https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide For wildcard searches in particular see the following section: https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-SpecifyingTermsfortheStandardQueryParser 2. Does presort supports any field instead of key or score? There is no presort option for Search 2.0. Presort is a work around for sorting issues in the current Search [1,2]. Solr sorts properly. Although, depending on the fields sorted, the results can become inconsistent for the same query over time because of a bug in Search 2.0 [3]. I'm not found this in 2.0 docs. These are the Riak Search 2.0.0beta1 docs. http://docs.basho.com/riak/2.0.0beta1/dev/using/search/ -Z 1: https://github.com/basho/riak_search/pull/54 2: http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-May/004219.html 3: https://github.com/basho/yokozuna/issues/355 ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak search fails to index via riakc_pb client
Index via the erlang client (error) === Eshell V5.9.2 (abort with ^G) 1 {ok, Conn} = riakc_pb_socket:start_link(localhost, 8087). {ok,0.34.0} 2 Body = {\name_s\:\tom\}. {\name_s\:\tom\} 3 Object2Store = riakc_obj:new({testtype,somebucket},1,Body). {riakc_obj,{testtype,somebucket}, 2,undefined,[],undefined, {\name_s\:\tom\}} 4 ok = riakc_pb_socket:put(Conn,Object2Store). ok The object is stored (in riak), but not indexed (in solr): $ curl http://localhost:8098/types/testtype/buckets/somebucket/keys/ 1 {name_s:tom} == /../riak-yokozuna-0.14.0-src/rel/riak/log/console.log == 2014-03-31 16:26:11.568 [error] 0.1448.0@yz_kv:index:204 failed to index object {{testtype,somebucket},1} with error badarg because [{dict,fetch,[content-type,{dict,4,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[[dot|{35,9,254,249,83,57,16,45,{1,63563491571}}]],[],[],[],[],[],[[X-Riak-VTag,53,117,84,48,117,57,84,88,70,88,98,79,53,77,77,76,103,67,102,55,100,74]],[[index]],[],[[X-Riak-Last-Modified|{1396,272371,362491}]],[],[]}}}],[{file,dict.erl},{line,125}]},{yz_doc,extract_fields,1,[{file,src/yz_doc.erl},{line,99}]},{yz_doc,make_doc,5,[{file,src/yz_doc.erl},{line,71}]},{yz_doc,'-make_docs/4-lc$^0/1-0-',5,[{file,src/yz_doc.erl},{line,60}]},{yz_kv,index,7,[{file,src/yz_kv.erl},{line,249}]},{yz_kv,index,3,[{file,src/yz_kv.erl},{line,191}]},{riak_kv_vnode,actual_put,6,[{file,src/riak_kv_vnode.erl},{line,1391}]},{riak_kv_vnode,perform_put,3,[{file,src/riak_kv_vnode.erl},{line,1380}]}] You failed to provide a content-type when building the object. It's not easy to see if you aren't used to Erlang but the error in the log shows a failure to find the key content-type in the object's metadata. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: updating the default bucket-type to support Solr indexing?
Actually, a bucket type is not required. To associate an index with a bucket you simply set the `search_index` property. Bucket types provide a method for many buckets to inherit the same properties. Setting the search_index property on the bucket type is a way to index multiple buckets under one index without setting the property on each bucket. Otherwise, I would suggest setting the property at the bucket level and not the type level. The specific problem Paul ran into is that he tried to change properties for the default type which is a special type that cannot be altered. -Z On Mon, Mar 24, 2014 at 11:52 AM, Luke Bakken lbak...@basho.com wrote: Hi Paul, You are correct, a new bucket type must be created for Riak 2.0 search indexes. -- Luke Bakken CSE lbak...@basho.com On Mon, Mar 24, 2014 at 2:27 AM, Paul Walk p...@paulwalk.net wrote: I'm experimenting with the technology preview of Riak 2.0, using an existing Ruby web application which uses the official Ruby client gem (1.4.3). My understanding is that if I have not specified particular bucket-types, then my buckets are implicitly using a 'default' bucket type. So, in order to try the all-new search functionality, I have tried to associate the default bucket-type with a search index, thus: ./riak-admin bucket-type update default '{props:{search_index:my_index}}' which returns the error: Error updating bucket type default: no_default_update Does this mean that in Riak 2.0, if one wants buckets to be indexed in Solr, one must create a new bucket_type in order to associate an index and then associate buckets with this? Thanks, Paul --- Paul Walk http://www.paulwalk.net --- ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: updating the default bucket-type to support Solr indexing?
There is support for indexing CRDTs. Field definitions are defined in the default schema: https://github.com/basho/yokozuna/blob/develop/priv/default_schema.xml#L96 On Mon, Mar 24, 2014 at 12:33 PM, Paul Walk p...@paulwalk.net wrote: Thanks Luke. If I might be allowed a follow-on question, what is the effect of adding an index to a 'typed bucket-type'. For example, if I define a bucket_type as follows: ./riak-admin bucket-type create map_bucket_type '{props:{search_index:my_index,datatype:map}}' Are the members of any maps stored in a bucket which uses this bucket-type going to get indexed in Solr? I would assume that some sort of mashalling function and a custom schema would be required? Thanks, Paul On 24 Mar 2014, at 15:52, Luke Bakken lbak...@basho.com wrote: Hi Paul, You are correct, a new bucket type must be created for Riak 2.0 search indexes. -- Luke Bakken CSE lbak...@basho.com On Mon, Mar 24, 2014 at 2:27 AM, Paul Walk p...@paulwalk.net wrote: I'm experimenting with the technology preview of Riak 2.0, using an existing Ruby web application which uses the official Ruby client gem (1.4.3). My understanding is that if I have not specified particular bucket-types, then my buckets are implicitly using a 'default' bucket type. So, in order to try the all-new search functionality, I have tried to associate the default bucket-type with a search index, thus: ./riak-admin bucket-type update default '{props:{search_index:my_index}}' which returns the error: Error updating bucket type default: no_default_update Does this mean that in Riak 2.0, if one wants buckets to be indexed in Solr, one must create a new bucket_type in order to associate an index and then associate buckets with this? Thanks, Paul --- Paul Walk http://www.paulwalk.net --- ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com --- Paul Walk http://www.paulwalk.net --- ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Search Index Not Found
On Sat, Mar 22, 2014 at 2:57 PM, Buri Arslon buri...@gmail.com wrote: another weird thing I noticed is that after I restart riak, get_search_index returns {ok, Index}, but after a few seconds, it's going back to {error, notfound} Do you see any errors related to that index in the solr.log file? ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: How to disassociate a bucket with a yokozuna index?
On Tue, Mar 11, 2014 at 4:17 AM, EmiNarcissus eminarcis...@me.com wrote: Now I'm working with riak 2.0 pre17, have tried both set bucket property search_index to other index or _dont_index_, but still cannot delete the index. Failure: riakasaurus.exceptions.RiakPBCException: Can't delete index with associate buckets [riakasaurus.tests.test_search] (0) What are the bucket properties for that bucket? I have a hunch of what it might be but need to see the properties to verify. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: How to disassociate a bucket with a yokozuna index?
On Sun, Mar 9, 2014 at 8:31 AM, EmiNarcissus eminarcis...@me.com wrote: I'm testing on yokozuna api now, but found every time I call delete_search_index function it alerts cannot delete because of pre-existed associated bucket. I've tried to set the bucket search-index to another index, still have the same error. Is this part not implemented yet? or is something I missed from? Hi Tim, An index may not be deleted if it has any buckets associated with it. The 'search_index' property (not 'search-index') must be changed to either a different index or the sentinel value '_dont_index_'. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: [ANN] Yokozuna 0.14.0
It was just pointed out to me that the links in the INSTALL doc were wrong. The packages have been moved off my s3 account into the main Basho S3 location. http://s3.amazonaws.com/files.basho.com/yokozuna/pkgs/riak-yokozuna-0.14.0-src.tar.gz http://s3.amazonaws.com/files.basho.com/yokozuna/pkgs/riak-yokozuna-0.14.0-src.tar.gz.sha1 The latest install doc has the corrected links. https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md#source-package -Z On Mon, Feb 24, 2014 at 12:36 PM, Ryan Zezeski rzeze...@basho.com wrote: Riak Users, I'm happy to announce the Yokozuna 0.14.0 release. It brings an upgrade to Solr 4.6.1 as well as a slew of bug fixes and internal enhancements. There are breaking changes made in this release so if you are one of the brave souls using riak2.0.0 pre5/pre11 or a previous Yokozuna source release then a rolling upgrade to 0.14.0 may not go smoothly. https://github.com/basho/yokozuna/blob/b50470f89cb75069d3a80b99502f3f08cb307f58/docs/RELEASE_NOTES.md#0140 https://github.com/basho/yokozuna/blob/b50470f89cb75069d3a80b99502f3f08cb307f58/docs/INSTALL.md -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[ANN] Yokozuna 0.14.0
Riak Users, I'm happy to announce the Yokozuna 0.14.0 release. It brings an upgrade to Solr 4.6.1 as well as a slew of bug fixes and internal enhancements. There are breaking changes made in this release so if you are one of the brave souls using riak2.0.0 pre5/pre11 or a previous Yokozuna source release then a rolling upgrade to 0.14.0 may not go smoothly. https://github.com/basho/yokozuna/blob/b50470f89cb75069d3a80b99502f3f08cb307f58/docs/RELEASE_NOTES.md#0140 https://github.com/basho/yokozuna/blob/b50470f89cb75069d3a80b99502f3f08cb307f58/docs/INSTALL.md -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: yokozuna Issues
Hello Bryce, On Wed, Feb 19, 2014 at 3:27 PM, Bryce Verdier bryceverd...@gmail.comwrote: Hey Hector, Thank you for looking into this, here is the response to 'java -version' on my machine: java -version java version 1.7.0_51 OpenJDK Runtime Environment (fedora-2.4.5.0.fc19-x86_64 u51-b31) OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode) I've been meaning to try buiding from source to see if the problem cropped up again, just haven't had the time yet. (I noticed that I didn't have the same issue when I built from source on my archLinux desktop --2.0pre14 --. Not sure if its related, but I just wanted to make sure). 2014-02-13 08:59:56.225 [info] 0.547.0@yz_solr_proc:handle_info:134 solr stdout/err: Caused by: java.lang.UnsupportedClassVersionError: com/basho/yokozuna/monitor/Monitor : Unsupported major.minor version 52.0 This is saying that com.basho.yokozuna.monitor.Monitor was compiled with J2SE1.8. That will not work with your 1.7 JRE. If you compile from source then you won't have this issue. The problem is that Yokozuna has some custom Solr handlers and they are compiled independently for each separate official Riak builder we have. In this case our Fedora builder has javac 1.8.0-internal. This is my fault. The importance of the compiling JDK and our builders totally slipped my mind. Yokozuna needs to be changed so that we just compile the JAR once and include it as part of the official build process. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak crashing when indexing for search
Hi Glory, On Tue, Feb 11, 2014 at 1:29 AM, Glory Lo gloryl...@gmail.com wrote: While indexing it seem to run fine part way.. then I noticed it hangs (it freezed my machine on a couple of attempts on linux mint 13). Then it crashes. I have 3 nodes running and I only tried indexing one of them doing a search-cmd mybucket dev1/data/leveldb What was the process for indexing? How much data were you indexing? What content-type? How big is each object? What is your schema? My crash log has multiple errors of different sorts which I haven't discern yet. However, the last errors w/ a close timestamp are as follows which mentions some timeouts (likely with the freeze): It's hard to discern ripple effect errors from the the origin error. I see some stuff that is indicative of disk corruption but there's a good chance that only happened because some other error caused merge_index to hard crash. Could you attach a tar.gz of all your logs? 2014-02-08 23:15:53 =ERROR REPORT Error in process 0.2799.1 on node 'dev1@127.0.0.1' with exit value: {badarg,[{ets,lookup,[145752322,{1118962191081472546749696200048404186924073353216,' dev2@127.0.0.1 '}],[]},{riak_search_client,'-process_terms_1/4-fun-2-',3,[{file,src/riak_search_client.erl},{line,295}]},{riak_search_utils,'-ptransform/2-fun-0-',2,[{file,src/riak_search_utils This is an error finding the temporary ETS table for building the postings list. That's a really interesting error to have and makes me wonder if you someone hit the ETS system limit. I'm not even sure that is possible given how high we've raised the default limit. 2014-02-08 23:18:46 =ERROR REPORT Error in process 0.2350.1 on node 'dev1@127.0.0.1' with exit value: {terminated,[{io,format,[17869.23.0,DEBUG: ~p:~p - ~p~n~n ~p~n~n,[riak_search_dir_indexer,194,{ error , Type , Error , erlang : get_stacktrace ( ) },{error,error,{case_clause,{error,timeout}},[{riak_search_client,'-index_docs/1-fun-0-'... I'm actually a bit baffled exactly what this trace is saying. I think more detail might be in the error.log. 2014-02-08 23:20:00 =ERROR REPORT Error in process 0.4231.1 on node 'dev1@127.0.0.1' with exit value: {{case_clause,{data,4711}},[{cpu_sup,get_uint32_measurement,2,[{file,cpu_sup.erl},{line,227}]},{cpu_sup,measurement_server_loop,1,[{file,cpu_sup.erl},{line,585}]}]} Yikes, this looks really bad and makes me wonder if this is an environment issue as this error should not be related to search. 2014-02-08 23:23:37 =ERROR REPORT Error in process 0.6359.1 on node 'dev1@127.0.0.1' with exit value: {badarg,[{erlang,binary_to_term,[31359 bytes],[]},{mi_segment,iterate_all_bytes,2,[{file,src/mi_segment.erl},{line,167}]},{mi_segment_writer,from_iterator,4,[{file,src/mi_segment_writer.erl},{line,102}]},{mi_segment_writer,from_iterator... This is typically what you see when data corruption occurs but it's hard to say if data corruption caused the other errors of the other errors caused corruption. 2014-02-08 23:24:58 =ERROR REPORT ** State machine 0.3211.0 terminating ** Last message in was {'EXIT',0.168.0,shutdown} ** When State == active ** Data == {state,1438665674247607560106752257205091097473808596992,riak_search_vnode,{vstate,1438665674247607560106752257205091097473808596992,merge_index_backend,{state,1438665674247607560106752257205091097473808596992,0.3212.0}},undefined,none,undefined,undefined,0.3221.0,{pool,riak_search_worker,2,[]},undefined,86616} ** Reason for termination = ** {timeout,{gen_server,call,[0.3212.0,stop]}} 2014-02-08 23:24:58 =CRASH REPORT crasher: initial call: riak_core_vnode:init/1 pid: 0.3211.0 registered_name: [] exception exit: {{timeout,{gen_server,call,[0.3212.0,stop]}},[{gen_fsm,terminate,7,[{file,gen_fsm.erl},{line,589}]},{proc_lib,init_p_do_apply,3,[{file,proc_lib.erl},{line,227}]}]} ancestors: [riak_core_vnode_sup,riak_core_sup,0.162.0] messages: [{'EXIT',0.3221.0,shutdown},{#Ref0.0.1.215952,ok},{'EXIT',0.3212.0,normal}] links: [] dictionary: [{random_seed,{27839,21123,25074}}] trap_exit: true status: running heap_size: 46368 stack_size: 24 reductions: 24758 neighbours: This is one of the riak_search vnodes crashing because it's merge index process crashed. Which is expected given the circumstances. 2014-02-08 23:24:58 =ERROR REPORT ** State machine 0.5392.1 terminating ** Last message in was {'$gen_sync_all_state_event',{0.5390.1,#Ref0.0.1.215861},{shutdown,6}} ** When State == ready ** Data == {state,{[],[]},0.5393.1,[],undefined} ** Reason for termination = ** {timeout,{gen_fsm,sync_send_all_state_event,[0.5393.1,stop]}} 2014-02-08 23:24:58 =CRASH REPORT crasher: initial call: riak_core_vnode_worker_pool:init/1 pid: 0.5392.1 registered_name: [] exception exit:
Re: Search schemas in 2.0pre11
On Tue, Feb 4, 2014 at 4:38 PM, Jeremy Pierre j.14...@gmail.com wrote: Hi Eric, Thanks very much - getting a 405 response for that curl command though. POST to same endpoint yields the following: The schema resources does not accept POST requests. Only PUT and GET. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna and array of JSON documents
Hi Srdjan, On Mon, Feb 3, 2014 at 12:06 PM, Srdjan Pejic spe...@gmail.com wrote: [{viewer_id_s=004615eb-5c0e-4c4a-890c-c6fc29e3fc56, video_time_i=475, type_s=joined}, {viewer_id_s=635dcd2d-fdeb-46c1-9920-803ccdd6176b, video_time_i=522, type_s=joined}, {viewer_id_s=04b3cec7-6f37-4840-b1b6-eff4c16dd273, video_time_i=159, type_s=joined}, {viewer_id_s=6ce3da5f-b598-4b1c-abf0-38ba92fa15fb, video_time_i=393, type_s=upvote}] My question to you is how can I search this array of documents through Yokozuna/Solr? Currently, I get 0 results back, which I suspect is because the actual JSON data is nested in an array and Yokozuna doesn't index that in an expected way. Assuming you are using the default schema, the issue is that you are using non multi-valued fields and thus this data is failing to index. If you check your console.log you should see errors with the string multiple values encountered for non multiValued field in them. Try changing your field names to the following: viewer_id_s = viewer_id_ss video_time_i = video_time_is type_s = type_ss -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search and Yokozuna Backup Strategy
Hi Elias, On Mon, Jan 27, 2014 at 2:40 PM, Elias Levy fearsome.lucid...@gmail.comwrote: Any comments on the backup strategy for Yokozuna? Will it make use of Solr's Replication Handler, or something more lower level? Will the node need to be offline to backup it up? There is no use of any Solr replication code--at all. Yokozuna (new Riak Search, yes I know the naming is confusing) can be thought of as secondary data to KV. It is a collection of index postings based on the canonical and authoritative KV data. Therefore, the postings can always be rebuilt from the KV data. AAE provides an automatic integrity check between the KV object and its postings that is run constantly in the background. Given that, there are two ways I see backup/restore working. 1. From a local, file-level perspective. You take a snapshot of your node's local filesystem and use that as a save point in case of future corruption. In this case you don't worry yourself with cluster-wide consistency, it's just a local backup. If you ever have to restore this data then AAE and read-repair can deal with any divergence that is caused by using the restore. Although, you could end up with resurrected data depending on your delete policy and age of backup. Another issue is that various parts of Riak that write to disk may not be snapshot safe. It's already been discussed how leveldb isn't. I'm willing to bet Lucene isn't either. Any case where a logical operation requires multiple filesystem writes you have to worry about the snapshot occurring in the middle of the logical operation. I have no idea how Lucene would deal with snapshots that occur at the wrong time. I'm unsure how good it is at detecting, and more importantly, recovering from corruption. This is one reason why AAE is so important. I do demos at my talks where I literally rm -rf the entire index dir and AAE rebuilds it from scratch. This will not necessarily be a fast operation in a real production database but it's good to know that the data can always be re-built from the KV data. If you can cover the KV data then you can always rebuild the indexes. 2. Backup/restore as a logical operation in Riak itself. We currently have a backup/restore but from what I hear it has various issues and needs to be fixed/replaced. But, assuming there was a backup command that worked I suppose you could try playing games with Yokozuna. Perhaps Yokozuna could freeze an index from merging segments and backup important files. Perhaps there are replication hooks built into Solr/Lucene that could be used. I'm not sure. I'm handwaving on purpose because I'm sure there are multiple avenues to explore. However, another option is to punt. As I said above the indexes can be rebuilt from the KV data. So if you have a backup that only works for KV then the restore operation would simply re-index the data as it is written. Yokozuna currently uses a low-level hook inside the KV vnode that notices any time that KV data is written so it should just work assuming restore goes through the KV code path and doesn't build files directly. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search and Yokozuna Backup Strategy
On Mon, Jan 27, 2014 at 4:02 PM, Elias Levy fearsome.lucid...@gmail.comwrote: So it would appear to do it properly, we'd need some support from Yokozuna to take the snapshot, return a list of files to backup or back them up itself (hard links?), and then to allow an application to signal it to release the snapshot or release it itself if its doing the backup. If you want to do local, file-based backups, yes. It would appear Yokozuna needs code added in order to backup the Lucene directories without issue. In the interim there is still the option of only backing up the KV data and rebuilding the indexes from that. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Timeouts on Riak Search
On Sun, Jan 26, 2014 at 7:41 PM, ender extr...@gmail.com wrote: I am continuously getting the following types of errors in my riak logs: 2014-01-27 00:06:39.735 [error] 0.220.0 Supervisor riak_pipe_builder_sup had child undefined started with {riak_pipe_builder,start_link,undefined} at 0.18590.125 exit with reason {{modfun,riak_search,mapred_search,[Mediastream,(type:image type:video type:FacebookPost) AND (teamSlug:nba.san-antonio-spurs home_teamSlug:nba.san-antonio-spurs away_teamSlug:nba.san-antonio-spurs)]},error,{badmatch,{error,timeout}},[{riak_search,mapred_search,3,[{file,src/riak_search.erl},{line,55}]},{riak_kv_mrc_pipe,send_inputs,3,[{file,src/riak_kv_mrc_pipe.erl},{line,627}]},{riak_kv_mrc_pipe,'-send_inputs_async/3-fun-0-',3,[{file,src/riak_kv_mrc_pipe.erl},{line,557}]}]} in context child_terminated I have just started using Riak last week, so most likely it's user error on my part. Would be grateful for any assistance! I have also attached some additional info (app.config, log files etc) to this email. Thanks, Satish Satish, I see you are doing a disjunction search across the type field `(type:image type:video type:FacebookPost)`. How many documents match that sub-query? If it is over 100k then legacy Riak Search will fail to return on the query causing it to timeout. In general, legacy Riak Search has issues with larger result sets. In 2.0 there is a new version of Riak Search (code name Yokozuna) which should have much less issues with larger result sets. You can try playing with it via the 2.0.0pre11 download. http://docs.basho.com/riak/2.0.0pre11/downloads/ http://docs.basho.com/riak/2.0.0pre11/downloads/ -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Timeouts on Riak Search
On Mon, Jan 27, 2014 at 5:20 PM, Ryan Zezeski rzeze...@basho.com wrote: I see you are doing a disjunction search across the type field `(type:image type:video type:FacebookPost)`. How many documents match that sub-query? If it is over 100k then legacy Riak Search will fail to return on the query causing it to timeout. In general, legacy Riak Search has issues with larger result sets. In 2.0 there is a new version of Riak Search (code name Yokozuna) which should have much less issues with larger result sets. You can try playing with it via the 2.0.0pre11 download. http://docs.basho.com/riak/2.0.0pre11/downloads/ http://docs.basho.com/riak/2.0.0pre11/downloads/ -Z I meant to include these links as well: https://github.com/basho/yokozuna#getting-started https://github.com/basho/yokozuna/tree/develop/docs ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Split index with Riak 2.0 git (jan 15th) on a single dev node cluster
John, 1. What did you use to load the data? Do you have a script? 2. What content-type is the data? 3. Do you see any errors in the log directory? Check error.log and solr.log. 4. Do you get any results for the query q=_yz_err:1 5. Did you wait at least 1 second before running the queries? 6. What version of Riak are you using? 7. Are you by chance using curl to run these test queries? If so can you please copy/paste or gist the entire curl input and output for each of the 3 different results? -Z On Tue, Jan 21, 2014 at 1:46 PM, John O'Brien j...@boardom.ca wrote: Issue: When running searchs against a single dev node cluster, pre-populated with 1000 keys, bitcask backend, search=on and a /search/svan?q=* search URI, the solr response is coming back with three different resultsets, 330 values, the other 354, the other 345. The range of keys 0-1000 are split in no obvious pattern between the 3 result shards.. Anyone have any clue as to what I may have messed up in the config? I assume this is not expected behaviour. Other than that, it works great. ;) Cheers, John ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[ANN] Yokozuna 0.13.0
Riak Users, It was a little late due to the holidays but Yokozuna 0.13.0 is here. This release brings an upgrade to Solr, support for indexing Riak Data Structures, the ability to reload indexes via `riak attach`, and includes a query performance boost. See the release notes for more details. https://github.com/basho/yokozuna/blob/develop/docs/RELEASE_NOTES.md#0130 Given the number of breaking changes since the Riak 2.0.0pre5 release I recommended using the 0.13.0 source package until a new Riak pre-release is made. This way the documentation can be followed without trouble. See the install instructions for more detail. https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md#source-package -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Cluster restarted and doesn’t respond to queries
On Mon, Dec 23, 2013 at 12:35 PM, Justin Lambert jlamb...@letsevenup.comwrote: I do see some errors in the error.log, but the referenced directories don’t appear to exist: 2013-12-23 16:52:33.290 [error] 0.1721.0@riak_kv_bitcask_backend:move_unused_dirs:607 Failed to move unused data directory ./data/leveldb/388211372416021087647853783690262677096107081728. Reason: eexist This error is interesting. It is coming from the bitcask backend but trying to read a leveldb directory. My guess is something happened with your configuration when you upgraded. What does your app.config look like? -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: [Confusing search docs] Enabling search on bucket in Riak 2.0
If a field isn't specified then it will default to 'text'. Which should work for plain text. But just as a sanity check I'd also be curious to see the results of the *:* query. On Wed, Nov 27, 2013 at 11:15 AM, Eric Redmond eredm...@basho.com wrote: That is not a valid solr query. You need to search by field:value. Try: http://192.168.1.10:8098/solr/logs/select?q=*:* Eric On Nov 27, 2013, at 7:23 AM, Kartik Thakore kthak...@aimed.cc wrote: Cool. I did the data activate and emptied out the bucket and set the props and created a different index. Still no go Here is the data: [2013-11-27T15:21:30] [ERROR] [192.168.1.102] [zach.scratchd.ca] [0] [ thakore.kar...@gmail.com] test Here is the search: http://192.168.1.10:8098/solr/logs/select?q=* http://192.168.1.10:8098/solr/logs/select?q=test No results found On Tue, Nov 26, 2013 at 8:56 PM, Ryan Zezeski rzeze...@basho.com wrote: Kartik, The pre7 tag incorporates the new bucket type integration. Bucket types are a new feature in 2.0 that provide additional namespace support and more efficient bucket properties (good for when you have many buckets with custom properties). The particular code you are running against requires that for data to be indexed in Yokozuna it must be stored under a non-default bucket type. Since you are not specifying a type the logs bucket lives under the default type where `yz_index` will not be applied. This will be changed for 2.0 so that any type of bucket may be indexed. In the meantime, try this: riak-admin bucket-type create data '{props:{}}' riak-admin bucket-type activate data curl -X PUT -H 'content-type: application/json' ' http://host:port/types/data/buckets/logs/props' -d '{props:{yz_index:allLogs}}' That above will change soon as well. We are attempting to rename most user facing parts of Yokozuna to just search. This means that `yz_index` will soon become `search_index`. Sorry for the inconvenience as things are in a bit of flux leading up to 2.0. -Z On Tue, Nov 26, 2013 at 5:59 PM, Kartik Thakore kthak...@aimed.ccwrote: So finally got a chance to try this and I am running into issues (I am on Riak 2.0pre7) btw. I have yz turned on: http://192.168.1.10:8098/yz I created the index with: $ curl -i http://192.168.1.10:8098/yz/index/allLogs HTTP/1.1 200 OK Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained) Date: Tue, 26 Nov 2013 22:51:15 GMT Content-Type: application/json Content-Length: 41 {name:allLogs,schema:_yz_default} And associated the search to the bucket probs: http://192.168.1.10:8098/buckets/logs/props { props: { allow_mult: true, basic_quorum: false, big_vclock: 50, chash_keyfun: { mod: riak_core_util, fun: chash_std_keyfun }, dw: quorum, last_write_wins: false, linkfun: { mod: riak_kv_wm_link_walker, fun: mapreduce_linkfun }, n_val: 3, name: logs, notfound_ok: true, old_vclock: 86400, postcommit: [ ], pr: 0, precommit: [ ], pw: 0, r: quorum, rw: quorum, small_vclock: 50, w: quorum, young_vclock: 20, yz_index: allLogs } } I put in a text/plain entry with: http://192.168.1.10:8098/riak/logs/26-11-2013T2260?vtag=47ffPuWSln7VhlTl02raJA [2013-11-26T22:43:26] [ERROR] [192.168.1.102] [0] [ thakore.kar...@gmail.com] test http://192.168.1.10:8098/riak/logs/26-11-2013T2260?vtag=6IYwwPE27eUbs8ThaSOcTC [2013-11-26T22:39:59] [ERROR] [192.168.1.102] [0] [ thakore.kar...@gmail.com] test But when I search: http://192.168.1.10:8098/search/allLogs?q=* No results http://192.168.1.10:8098/search/allLogs?q=test No results Whats going on? On Thu, Nov 21, 2013 at 12:45 PM, Ryan Zezeski rzeze...@basho.com wrote: On Wed, Nov 20, 2013 at 3:48 PM, Kartik Thakore kthak...@aimed.cc wrote: Thank you. I am creating indexes with: curl -i -XPUT http://192.168.1.10:8098/yz/index/allLogs \ -H 'content-type: application/json' \ -d '{schema : _yz_default, bucket : logs }' But when I check the index with: curl -i http://192.168.1.10:8098/yz/index/allLogs It drops the bucket association HTTP/1.1 200 OK Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained) Date: Wed, 20 Nov 2013 20:45:21 GMT Content-Type: application/json Content-Length: 41 {name:allLogs,schema:_yz_default} Sorry, that documentation is out of date. To associate an index to a bucket you need to set the bucket's properties. curl -XPUT -H 'content-type: application/json' ' http://localhost:8098/buckets/logs/props' -d '{props:{yz_index:allLogs}}' You can perform a GET on that same resource to check the yz_index property is set. Also what is going on here curl -XPUT -H'content-type:application/json' http://localhost:8098/buckets/people/keys/me \ -d'{ name_s : kartik }' Why not: curl -XPUT -H'content-type:application/json' http://localhost:8098/rial/people/me \ -d'{ name_s : kartik
Re: [Confusing search docs] Enabling search on bucket in Riak 2.0
Kartik, The pre7 tag incorporates the new bucket type integration. Bucket types are a new feature in 2.0 that provide additional namespace support and more efficient bucket properties (good for when you have many buckets with custom properties). The particular code you are running against requires that for data to be indexed in Yokozuna it must be stored under a non-default bucket type. Since you are not specifying a type the logs bucket lives under the default type where `yz_index` will not be applied. This will be changed for 2.0 so that any type of bucket may be indexed. In the meantime, try this: riak-admin bucket-type create data '{props:{}}' riak-admin bucket-type activate data curl -X PUT -H 'content-type: application/json' 'http://host:port/types/data/buckets/logs/props' -d '{props:{yz_index:allLogs}}' That above will change soon as well. We are attempting to rename most user facing parts of Yokozuna to just search. This means that `yz_index` will soon become `search_index`. Sorry for the inconvenience as things are in a bit of flux leading up to 2.0. -Z On Tue, Nov 26, 2013 at 5:59 PM, Kartik Thakore kthak...@aimed.cc wrote: So finally got a chance to try this and I am running into issues (I am on Riak 2.0pre7) btw. I have yz turned on: http://192.168.1.10:8098/yz I created the index with: $ curl -i http://192.168.1.10:8098/yz/index/allLogs HTTP/1.1 200 OK Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained) Date: Tue, 26 Nov 2013 22:51:15 GMT Content-Type: application/json Content-Length: 41 {name:allLogs,schema:_yz_default} And associated the search to the bucket probs: http://192.168.1.10:8098/buckets/logs/props { props: { allow_mult: true, basic_quorum: false, big_vclock: 50, chash_keyfun: { mod: riak_core_util, fun: chash_std_keyfun }, dw: quorum, last_write_wins: false, linkfun: { mod: riak_kv_wm_link_walker, fun: mapreduce_linkfun }, n_val: 3, name: logs, notfound_ok: true, old_vclock: 86400, postcommit: [ ], pr: 0, precommit: [ ], pw: 0, r: quorum, rw: quorum, small_vclock: 50, w: quorum, young_vclock: 20, yz_index: allLogs } } I put in a text/plain entry with: http://192.168.1.10:8098/riak/logs/26-11-2013T2260?vtag=47ffPuWSln7VhlTl02raJA [2013-11-26T22:43:26] [ERROR] [192.168.1.102] [0] [ thakore.kar...@gmail.com] test http://192.168.1.10:8098/riak/logs/26-11-2013T2260?vtag=6IYwwPE27eUbs8ThaSOcTC [2013-11-26T22:39:59] [ERROR] [192.168.1.102] [0] [ thakore.kar...@gmail.com] test But when I search: http://192.168.1.10:8098/search/allLogs?q=* No results http://192.168.1.10:8098/search/allLogs?q=test No results Whats going on? On Thu, Nov 21, 2013 at 12:45 PM, Ryan Zezeski rzeze...@basho.com wrote: On Wed, Nov 20, 2013 at 3:48 PM, Kartik Thakore kthak...@aimed.cc wrote: Thank you. I am creating indexes with: curl -i -XPUT http://192.168.1.10:8098/yz/index/allLogs \ -H 'content-type: application/json' \ -d '{schema : _yz_default, bucket : logs }' But when I check the index with: curl -i http://192.168.1.10:8098/yz/index/allLogs It drops the bucket association HTTP/1.1 200 OK Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained) Date: Wed, 20 Nov 2013 20:45:21 GMT Content-Type: application/json Content-Length: 41 {name:allLogs,schema:_yz_default} Sorry, that documentation is out of date. To associate an index to a bucket you need to set the bucket's properties. curl -XPUT -H 'content-type: application/json' ' http://localhost:8098/buckets/logs/props' -d '{props:{yz_index:allLogs}}' You can perform a GET on that same resource to check the yz_index property is set. Also what is going on here curl -XPUT -H'content-type:application/json' http://localhost:8098/buckets/people/keys/me \ -d'{ name_s : kartik }' Why not: curl -XPUT -H'content-type:application/json' http://localhost:8098/rial/people/me \ -d'{ name_s : kartik }' In Riak 1.0.0 we changed the resource from '/riak/bucket/key' to '/buckets/bucket/keys/key'. We were supposed to deprecate and eventually remove the old resource but we never did. You can still use the old style but I would recommend using the new style as it is what we use in official docs and there is a chance perhaps the old resources don't stay up to date with the latest features. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Search Crashes
On Wed, Nov 20, 2013 at 2:38 PM, Gabriel Littman g...@connectv.com wrote: 1) We are installed via deb package ii riak 1.4.1-1 Riak is a distributed data store There's a 1.4.2 out but your issue doesn't seem to have anything do with a specific 1.4.1 bug. 2) We did recently upgrade our to riak python library 2.0 but I also have a cluster still on the 1.4 client that has similar problems. Okay, so for now we assume the client upgrade didn't cause the issues either. 3) We less recently upgraded riak itself from 1.2.x to 1.4. We ended up starting with an empty riak store in the processes. Honestly we've had many problems with search index even under 1.2. Mostly riak would get into a state where it would continuously crash after startup until we deleted /var/lib/riak/merge_index on the node and then rebuilt the search index via read/write. The particular problems I'm having now I cannot confirm if they were happening under riak 1.2 or not. The 1.2 issues may very well have been caused by a corruption bug that was fixed in 1.4.0 [1]. looks like allow_mult is false, but I just confirmed with my colleague that *it was previously set to true* so it could be that we have a hold over issue from that. $ curl 'http://10.1.2.95:8098/buckets/ctv_tvdata/props' {props:{allow_mult:false,basic_quorum:false,big_vclock:50,chash_keyfun:{mod:riak_core_util,fun:chash_std_keyfun},dw:0,last_write_wins:false,linkfun:{mod:riak_kv_wm_link_walker,fun:mapreduce_linkfun},n_val:3,name:ctv_tvdata,notfound_ok:false,old_vclock:86400,postcommit:[],pr:0,precommit:[{fun:precommit,mod:riak_search_kv_hook},{mod:riak_search_kv_hook,fun:precommit}],pw:0,r:1,rw:1,search:true,small_vclock:50,w:1,young_vclock:20}} So after setting allow_mult back to false you'd have to make sure to resolve any siblings but that should be done automatically for you now that allow_mult is false again. However, the commit hook will also crash if you have allow_mult set to true on Riak Search's special proxy object bucket. Looking at your original insert crash message I notice the problem is actually with the proxy objets stored in this bucket [2]. What does the following curl show you: curl 'http://host:post/buckets/_rsid_ctv_tvdata/props' I bet $5 it has allow_mult set to true. Try setting that to false and see what happens. Since it is now set to false now would you have a suggestion on how to clear the problem? (Delete merge_index?) You shouldn't have to delete merge index files unless they are corrupted. Let's see if we can fix your insert/index problem first. Then we can work on search if it is still broken. -Z [1]: https://github.com/basho/merge_index/pull/30 [2]: It's not easy to see by there is the atom 'riak_idx_doc' which indicates this is a proxy object created by Riak Search. If you squint hard enough you can see the analyzed fields as well. I should have looked more closely the first time. This is not an obvious error. I wouldn't expect many people to pick up on it. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: [Confusing search docs] Enabling search on bucket in Riak 2.0
On Wed, Nov 20, 2013 at 3:48 PM, Kartik Thakore kthak...@aimed.cc wrote: Thank you. I am creating indexes with: curl -i -XPUT http://192.168.1.10:8098/yz/index/allLogs \ -H 'content-type: application/json' \ -d '{schema : _yz_default, bucket : logs }' But when I check the index with: curl -i http://192.168.1.10:8098/yz/index/allLogs It drops the bucket association HTTP/1.1 200 OK Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained) Date: Wed, 20 Nov 2013 20:45:21 GMT Content-Type: application/json Content-Length: 41 {name:allLogs,schema:_yz_default} Sorry, that documentation is out of date. To associate an index to a bucket you need to set the bucket's properties. curl -XPUT -H 'content-type: application/json' ' http://localhost:8098/buckets/logs/props' -d '{props:{yz_index:allLogs}}' You can perform a GET on that same resource to check the yz_index property is set. Also what is going on here curl -XPUT -H'content-type:application/json' http://localhost:8098/buckets/people/keys/me \ -d'{ name_s : kartik }' Why not: curl -XPUT -H'content-type:application/json' http://localhost:8098/rial/people/mehttp://localhost:8098/buckets/people/keys/me \ -d'{ name_s : kartik }' In Riak 1.0.0 we changed the resource from '/riak/bucket/key' to '/buckets/bucket/keys/key'. We were supposed to deprecate and eventually remove the old resource but we never did. You can still use the old style but I would recommend using the new style as it is what we use in official docs and there is a chance perhaps the old resources don't stay up to date with the latest features. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search Map Reduce error
Roger, Riak Search has a hardcoded max result set size of 100K items. It enforces this to prevent blowing out memory and causing other issues. Riak Search definitely has some issues when it comes to handling a use case like yours. That said, our new Search solution in 2.0 (code named Yokozuna) should do a lot better. Not only does it not have the hardcoded 100K limit but it should also execute the queries faster. In some cases by 1-3 orders of magnitude (10-1000x). At that point you're more likely to be slowed down by the map-reduce. You might even be able to remove that stage by using stored fields, but I'd need to know more about your use case. I agree that current Riak (pre 2.0) is not a general search solution. Riak Search can work very well but it requires some hand holding and careful vigilance of how you index and query the data. I feel that the new Search (Yokozuna) fixes this in many ways. In general, it has more robust search support and lower, more consistent latency. Yokozuna would also have no issues dealing with 1 million objects. My micro benchmark that I run is 1-10 million objects. Granted, they are small plain-text objects, but I'm fairly confident it would work with your 1 million objects. I realize that Riak 2.0, and thus the new search functionality, is not out yet. We have an early release, Riak 2.0.0pre5 [1], that you can try. I also do monthly releases of the new search functionality [2]. So if you want to kick the tires I can point you in the right direction. -Z [1]: http://docs.basho.com/riak/2.0.0pre5/downloads/ [2]: https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md On Wed, Nov 20, 2013 at 11:45 AM, Roger Diller ro...@flexrentalsolutions.com wrote: I could dig up all our nitty gritty Riak details but I don't think that will help really. The point I think is this: Using search map reduce is not a viable way to do real time search queries. Especially ones that may have 2000+ plus results each. Couple that with search requests coming in every few seconds from 300+ customer app instances and you literally bring Riak to it's knees. Not that Riak is the problem really, it's just we are using it in a way it was not designed for. In essence, we are using Riak as a search engine for our application data. Correct me if I'm wrong but Riak is more for storing large amounts of KV data, but not really for finding that data in a search sense. Am I missing something here? Is there a viable way for doing real time search queries on a bucket with 1 million keys? On Mon, Nov 18, 2013 at 5:29 PM, Alexander Sicular sicul...@gmail.comwrote: More info please... Version Current config Hardware Data size Search Schema Etc. But I would probably say that your search is returning too many keys to your mr. More inline. @siculars http://siculars.posthaven.com Sent from my iRotaryPhone On Nov 18, 2013, at 13:59, Roger Diller ro...@flexrentalsolutions.com wrote: Using the Riak Java client, I am executing a search map reduce like this: MapReduceResult result = riakClient.mapReduce(SEARCH_BUCKET, search).execute(); ^is this part a typo. Cause otherwise it looks like you do a smr, set the search and then another smr. String search = systemId: + systemName + AND indexId: + indexId; MapReduceResult result = riakClient.mapReduce(SEARCH_BUCKET, search).execute(); This worked fine when the bucket contained a few thousand keys. Now that we have far more data stored in the bucket (at least 250K keys), it's throwing this generic error: com.basho.riak.client.RiakException: java.io.IOException: {error:map_reduce_error} We've also noticed that storing new key/values in the bucket has slowed WAY down. Any idea what's going on? Your data set is incorrectly sized to your production config. Are there limitations to Search Map Reduce? Certainly Are there configuration options that need changed? Possibly Any help would be greatly appreciated. -- Roger Diller Flex Rental Solutions, LLC Email: ro...@flexrentalsolutions.com Skype: rogerdiller Time Zone: Eastern Time ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com -- Roger Diller Flex Rental Solutions, LLC Email: ro...@flexrentalsolutions.com Skype: rogerdiller Time Zone: Eastern Time ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Search Crashes
Hi Gabriel, First, let me verify a few things. 1. You are on Riak 1.4? Which patch version? 1.4.2? 2. You recently upgraded you client? Did you have any of these failures before upgrading the client? 3. Have you made any other changes between the time your system was working and the time it started exhibiting these failures? For example, set allow_mult=true? Given that you are having 'badmatch' hook crashes during insert I have the suspicion that allow_mult was recently changed to true as the Riak Search hook cannot deal with siblings. What does the following curl show: curl 'http://host:port/buckets/ctv_tvdata/props' If that has 'allow_mult: true' then that is your issue. As for your search operations. I'm not sure why they are failing. If you want you could tar.gz all the logs for each node and email that to me. -Z On Mon, Nov 18, 2013 at 7:00 PM, Gabriel Littman g...@connectv.com wrote: Hi All, We've been working with a search enabled bucket in riak for a while now and off and on it has been giving us trouble. In the past it has been solved by reindexing all the data by just reading and writing the data back into riak. But even this is failing now on some input data. Any help/insite would be greatly appreciated. We are on riak 1.4 We have recently switched to riak python api 2.0 smrtv@fre-prod-svr15:~$ python Python 2.7.3 (default, Aug 1 2012, 05:14:39) [GCC 4.6.3] on linux2 Type help, copyright, credits or license for more information. import riak r = riak.RiakClient() b = r.bucket('ctv_tvdata') o = b.get('/data/v2/search_show/TMS.Show.9838380') o.data {'type': 'show', 'expires': '99', 'subject_name': 'Monsters vs. Aliens', 'sub_type': 'Series', 'topic': '__ref--/data/v2/topic/TMS.Show.9838380:r1384276501.854346', 'person': '__None__', 'searchable_key': 'aliens vs monstersvsaliens monsters', 'date': '2013-11-23', 'sport': '__None__', 'genre': 'Children', 'id': '/data/v2/search_show/TMS.Show.9838380'} o.store() Traceback (most recent call last): File stdin, line 1, in module File /usr/local/lib/python2.7/dist-packages/riak/riak_object.py, line 281, in store timeout=timeout) File /usr/local/lib/python2.7/dist-packages/riak/client/transport.py, line 127, in wrapper return self._with_retries(pool, thunk) File /usr/local/lib/python2.7/dist-packages/riak/client/transport.py, line 69, in _with_retries return fn(transport) File /usr/local/lib/python2.7/dist-packages/riak/client/transport.py, line 125, in thunk return fn(self, transport, *args, **kwargs) File /usr/local/lib/python2.7/dist-packages/riak/client/operations.py, line 289, in put timeout=timeout) File /usr/local/lib/python2.7/dist-packages/riak/transports/http/transport.py, line 144, in put return self._parse_body(robj, response, [200, 201, 204, 300]) File /usr/local/lib/python2.7/dist-packages/riak/transports/http/codec.py, line 64, in _parse_body self.check_http_code(status, expected_statuses) File /usr/local/lib/python2.7/dist-packages/riak/transports/http/transport.py, line 446, in check_http_code (expected_statuses, status)) Exception: Expected status [200, 201, 204, 300], received 500 Using protocol buffs gives an erlang riak_search_kv_hook,precommit,error: r = riak.RiakClent(protocol='pcb') Traceback (most recent call last): File stdin, line 1, in module AttributeError: 'module' object has no attribute 'RiakClent' r = riak.RiakClient(protocol='pcb') Traceback (most recent call last): File stdin, line 1, in module File /usr/local/lib/python2.7/dist-packages/riak/client/__init__.py, line 99, in __init__ self.protocol = protocol or 'http' File /usr/local/lib/python2.7/dist-packages/riak/client/__init__.py, line 118, in _set_protocol repr(self.PROTOCOLS)) ValueError: protocol option is invalid, must be one of ['http', 'https', 'pbc'] r = riak.RiakClient(protocol='pbc') b = r.bucket('ctv_tvdata') o = b.get('/data/v2/search_show/TMS.Show.9838380') o.store() Traceback (most recent call last): File stdin, line 1, in module File /usr/local/lib/python2.7/dist-packages/riak/riak_object.py, line 281, in store timeout=timeout) File /usr/local/lib/python2.7/dist-packages/riak/client/transport.py, line 127, in wrapper return self._with_retries(pool, thunk) File /usr/local/lib/python2.7/dist-packages/riak/client/transport.py, line 69, in _with_retries return fn(transport) File /usr/local/lib/python2.7/dist-packages/riak/client/transport.py, line 125, in thunk return fn(self, transport, *args, **kwargs) File /usr/local/lib/python2.7/dist-packages/riak/client/operations.py, line 289, in put timeout=timeout) File /usr/local/lib/python2.7/dist-packages/riak/transports/pbc/transport.py, line 194, in put MSG_CODE_PUT_RESP) File /usr/local/lib/python2.7/dist-packages/riak/transports/pbc/connection.py, line 43, in _request
Re: Riak Yokozuna and a schema
Leif, I quickly wrote up a gist to show how you can use a custom schema with your index and associate it with multiple buckets. Be warned that the current version of Riak/Yokozuna uses bucket properties for storing the index association. These are stored in the ring and have a known limitation. The next version (0.12.0) will use a much more efficient mechanism for storing association but will also change some of the steps outlined in that gist. The Yokozuna API is still a bit of a moving target leading up to the Riak 2.0 Final release. https://gist.github.com/rzezeski/7488192 -Z On Fri, Nov 15, 2013 at 7:11 AM, Leif Gensert l...@propertybase.com wrote: Hello everyone, I am currently evaluating Riak for a project of ours. Here are the requirements in a nutshell: - We get various customer data as json with different field names (let’s just pretend that we have books). - We need to store these data as it comes (JSON with the original field names). - We need to have a consistent search index with fields specified by us. Example: Customer A: { book_title: ‘Alice in wonderland’, num_of_pages: 314, } Customer B: { book_name: ’Sherlock Holmes in the Hound of the Baskervilles’, number_of_pages: 164, } So far so good. This data need to be stored for example like this: { title: ‘Alice in wonderland’, pages: 314, } { title: ’Sherlock Holmes in the Hound of the Baskervilles’, pages: 164, } My thought was this: - Store data from each customer to a different bucket. - Index the data from the document to Yokozuna (after all, Solr has a schema so we could utilize that) My question concerning this would be: What’s the best way to do this? So far the only tutorials I found concerning Yokozuna index the documents without a schema. best Leif ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna Schema Changes
Hi Jeremiah, Yes. I very much want the ability to update the schema in 2.0. More fundamental things have leap frogged it. Technically you can modify a schema today but it has to be done by hand and is error prone. -Z On Fri, Nov 8, 2013 at 6:42 PM, Jeremiah Peschka jeremiah.pesc...@gmail.com wrote: I notice that YZ issue 130 (support for schema updates) was created 5 months ago and doesn't have any commits against it right now. Is this still on track to get pushed into the product as part of Riak 2.0 or has no work begun? Thanks --- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna Schema Changes
Absolutely not ideal. However, adding the ability to more easily mutate the schema will come with a cost. Adding a field that wasn't there before and you only want it indexed for newly written objects, easy. Adding a field and you want to re-index your objects, little tricker. Removing a field that has hundreds, thousands, millions, etc of matching Solr documents; better be careful. Changing the field type or analysis chain; now you are asking for serious trouble. I still plan to add the feature but mutating a schema must be done with caution. I will probably just end up writing a bunch of scary documentation to warn of the pitfalls :) -Z On Fri, Nov 8, 2013 at 6:56 PM, Jeremiah Peschka jeremiah.pesc...@gmail.com wrote: Yeah, I ran into some difficulties while trying to modify schema. Even after modifications I ended up having to do a rolling restart of the cluster to get YZ to pick up the new schema. Obviously a rolling restart of Riak isn't the biggest issue on earth, it's not ideal either. --- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop On Fri, Nov 8, 2013 at 3:54 PM, Ryan Zezeski rzeze...@basho.com wrote: Hi Jeremiah, Yes. I very much want the ability to update the schema in 2.0. More fundamental things have leap frogged it. Technically you can modify a schema today but it has to be done by hand and is error prone. -Z On Fri, Nov 8, 2013 at 6:42 PM, Jeremiah Peschka jeremiah.pesc...@gmail.com wrote: I notice that YZ issue 130 (support for schema updates) was created 5 months ago and doesn't have any commits against it right now. Is this still on track to get pushed into the product as part of Riak 2.0 or has no work begun? Thanks --- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: [ANN] Yokozuna 0.11.0
As promised here is documentation on how to use the new security features with Yokozuna. Unfortunately, I also found a bug while making this document so not everything will work as described until the next release. Essentially the HTTP search authorization will always fail since it's checking against the schema resource type rather than the index resource (but protocol buffers should work). Also, some of the terminology used in this document may change in the next few months as we polish things. https://github.com/basho/yokozuna/blob/develop/docs/SECURITY.md -Z On Thu, Nov 7, 2013 at 9:29 AM, Ryan Zezeski rzeze...@basho.com wrote: Riak Users, Today I'm happy to announce the 0.11.0 release of Yokozuna. This release brings Riak Java Client support as well as authentication and security for the HTTP and protocol buffer transports. An access control list (ACL) may be created to control administration and access to indexes. All official Riak clients should now have full support for Yokozuna's administration and search API. Stored boolean fields and tagging support were fixed for the protocol buffer transport. And finally, documentation was added. The new CONCEPTS document [1] goes over various important concepts in Yokozuna and the RESOURCES document [2] has links to other resources for learning. There isn't much documentation on specifically how to use the new security features besides in the pull request itself but I will rectify that soon with a security specific doc page. This release may confuse some people given that the Riak 2.0 Tech Preview (2.0.0pre5) [3] was just released last week. Why continue with separate Yokozuna releases? What is the difference? These questions are answered in the INSTALL document [4], but the short story is that Yokozuna runs on a monthly release cycle and therefore out paces official Riak releases. These monthly releases allow you to try the latest Yokozuna features and bug fixes without waiting for the next Riak release. These releases should never be used for production. See the INSTALL document for more information. In summary: If you need to test the latest features then use the Riak-Yokozuna source package. Otherwise just stick to the tech preview until the final Riak 2.0 package drops. Finally, the only feature in Riak-Yokozuna 0.11.0 not found in Riak 2.0.0pre5 is the security feature. See the release notes and install document for more details. https://github.com/basho/yokozuna/blob/develop/docs/RELEASE_NOTES.md#0110 https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md -Z [1]: https://github.com/basho/yokozuna/blob/develop/docs/CONCEPTS.md [2]: https://github.com/basho/yokozuna/blob/develop/docs/RESOURCES.md [3]: http://docs.basho.com/riak/2.0.0pre5/downloads/ [4]: https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[ANN] Yokozuna 0.11.0
Riak Users, Today I'm happy to announce the 0.11.0 release of Yokozuna. This release brings Riak Java Client support as well as authentication and security for the HTTP and protocol buffer transports. An access control list (ACL) may be created to control administration and access to indexes. All official Riak clients should now have full support for Yokozuna's administration and search API. Stored boolean fields and tagging support were fixed for the protocol buffer transport. And finally, documentation was added. The new CONCEPTS document [1] goes over various important concepts in Yokozuna and the RESOURCES document [2] has links to other resources for learning. There isn't much documentation on specifically how to use the new security features besides in the pull request itself but I will rectify that soon with a security specific doc page. This release may confuse some people given that the Riak 2.0 Tech Preview (2.0.0pre5) [3] was just released last week. Why continue with separate Yokozuna releases? What is the difference? These questions are answered in the INSTALL document [4], but the short story is that Yokozuna runs on a monthly release cycle and therefore out paces official Riak releases. These monthly releases allow you to try the latest Yokozuna features and bug fixes without waiting for the next Riak release. These releases should never be used for production. See the INSTALL document for more information. In summary: If you need to test the latest features then use the Riak-Yokozuna source package. Otherwise just stick to the tech preview until the final Riak 2.0 package drops. Finally, the only feature in Riak-Yokozuna 0.11.0 not found in Riak 2.0.0pre5 is the security feature. See the release notes and install document for more details. https://github.com/basho/yokozuna/blob/develop/docs/RELEASE_NOTES.md#0110 https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md -Z [1]: https://github.com/basho/yokozuna/blob/develop/docs/CONCEPTS.md [2]: https://github.com/basho/yokozuna/blob/develop/docs/RESOURCES.md [3]: http://docs.basho.com/riak/2.0.0pre5/downloads/ [4]: https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna: Riak Python client PB error with Solr stored boolean fields
Dave, This is a bug in Yokozuna. I have a hot-patch you can try. I'll email you directly with an attachment. I created an issue as well. https://github.com/basho/yokozuna/issues/209 -Z On Mon, Oct 14, 2013 at 9:27 PM, Dave Martorana d...@flyclops.com wrote: I studied the problem I was having with using the Python client's .fulltext_search(...) method and got it down to this - it seems that I get an error when searching against Solr using the Python client's .fulltext_search(...) method (using protocol buffers) whenever I have a * stored* boolean field. In my schema, I have: field name=banned type=boolean indexed=true stored=true / With that (or any named field of type boolean that is set to stored=true) I receive the following stack trace: http://pastebin.com/ejCixPEZ In the error.log file on the server, I see the following repeated: 2013-10-15 01:21:17.480 [error] 0.2872.0@yz_pb_search:maybe_process:95 function_clause [{yz_pb_search,to_binary,[false],[{file,src/yz_pb_search.erl},{line,154}]},{yz_pb_search,encode_field,2,[{file,src/yz_pb_search.erl},{line,152}]},{lists,foldl,3,[{file,lists.erl},{line,1197}]},{yz_pb_search,encode_doc,1,[{file,src/yz_pb_search.erl},{line,144}]},{yz_pb_search,'-maybe_process/3-lc$^0/1-0-',1,[{file,src/yz_pb_search.erl},{line,76}]},{yz_pb_search,maybe_process,3,[{file,src/yz_pb_search.erl},{line,76}]},{riak_api_pb_server,process_message,4,[{file,src/riak_api_pb_server.erl},{line,383}]},{riak_api_pb_server,connected,2,[{file,src/riak_api_pb_server.erl},{line,221}]}] Does anyone have any insight? I'm not a Solr expert, so perhaps storing boolean fields for retrieval is not a good idea? I know if I index but don't store, I can still successfully search against a boolean value. Thanks! Dave ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak search restore
Jon, The schema file is stored in a special bucket '_rs_schema' as well as cached in memory. -Z On Fri, Oct 4, 2013 at 2:33 AM, Jon Debonis j...@trov.com wrote: Hello, Riak includes these commands: search-cmd set-schema [INDEX] SCHEMAFILE search-cmd show-schema [INDEX] Once imported/loaded, where is this schema file stored? Is it in a bucket, or on the filesystem? Thanks Jon ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[ANN] Yokozuna 0.10.0
Riak Users, The 0.10.0 release of Yokozuna is ready. This release brings a few features such as an upgrade in Solr version along with some basic indexing and query stats. The default index has been removed returning write performance closer to baseline for non-indexed buckets. Disk usage was decreased by removing the default index and the unused timestamp from the entropy data. Among the list of other fixes a notable one is the improvement of Solr start-up and crash semantics. If Solr crashes too frequently then Yokozuna will stop the local Riak node. https://github.com/basho/yokozuna/blob/develop/docs/RELEASE_NOTES.md#0100 For installation instructions see the INSTALL doc. https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md The 0.11.0 release should be wrapped up towards the end of next week. https://github.com/basho/yokozuna/issues?milestone=11state=open -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[ANN] Yokozuna 0.9.0
Riak Users, The ninth release of Yokozuna has arrived. It is now integrated with the Riak development branch. This means no more special merge branches and you get the latest and greatest Riak code. It also means Yokozuna is on track to be delivered with the next release of Riak. There is now support for index and schema administration over protocol buffers. A major performance regression was fixed. An AAE deadlock issue was fixed. And work has started for Riak Search migration. For a full list of changes see the release notes. https://github.com/basho/yokozuna/blob/develop/docs/RELEASE_NOTES.md#090 For installation instructions see the INSTALL doc. https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna and Spatial Search
Vincenzo, I replied on the GitHub issue. -Z On Sun, Sep 8, 2013 at 8:23 AM, Vincenzo Vitale vincenzo.vit...@gmail.comwrote: I got it working with this change to the default conf: https://github.com/basho/yokozuna/pull/169 Before doing this, I first tried creating my own schema but put was hanging. V. On Sun, Sep 8, 2013 at 3:56 AM, Vincenzo Vitale vincenzo.vit...@gmail.com wrote: Hi, I'm trying to make SpatialSearch in my application working with Yokozuna. (develop branch, hash 601560bf9ea0859e598957c13733fbbb0e656e17 of the 6th september) The json object looks like this: {where:{latitude:7430019,longitude:4210023,geolocation_p:7.430019,4.210023},timestamp:2013-09-08T01:10:07.752Z} since there is already a dynamic field for *_p defined. But the query: http://127.0.0.1:8093/solr/my-index/select?q=*:*fq={!geofilt}spatial=truept=7.430019%2C4.210023sfield=where_geolocation_pd=1http://127.0.0.1:8093/solr/my-index/select?q=*:*fq=%7B!geofilt%7Dspatial=truept=7.430019%2C4.210023sfield=where_geolocation_pd=1 returns the error: can not use FieldCache on multivalued field: where_geolocation_p_0_coordinate Looking at this: http://stackoverflow.com/questions/7068605/solr-spatial-search-can-not-use-fieldcache-on-multivalued-field it seems the problem is the missing parameter and the dynamic field declaration for *_coordinates in the configuration file. Is this the cause of the problem? The _yz_default.xml files in the data directory seems overwritten every time riak is restarted, is there a way to customize the solr configuration per bucket? Thanks in advance, Vincenzo. -- If your e-mail inbox is out of control, check out http://sanebox.com/t/mmzve. I love it. -- If your e-mail inbox is out of control, check out http://sanebox.com/t/mmzve. I love it. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak core dumped, merge_index corruption?
Deyan, What Riak version are you running? There was a corruption issue discovered and fixed in the 1.4.0 release. https://github.com/basho/riak/blob/riak-1.4.0/RELEASE-NOTES.md#issues--prs-resolved https://github.com/basho/merge_index/pull/30 As for fixing, you'll want to delete the buffer files for the partitions which are having issues. E.g. if you look in crash.log you'll see partition numbers for the crashing vnodes. ** Data == {state,685078892498860742907977265335757665463718379520,riak_search_vnode,undefined,undefined,none,undefined,undefined,undefined,undefined,0} In the /storage/riak/merge_index/685078892498860742907977265335757665463718379520 you'll see buffer files. You'll want to delete those. After deleting all these bad buffers Riak Search should start fine. You'll then want to upgrade to 1.4.1 to avoid corruption in the future. Finally, since you have to delete the buffers you'll have missing indexes and you'll want to re-index your data. Since only one of your nodes experience corruption you can use the built-in repair functionality to re-index only data for those partitions. First you'll want to attach to one of your nodes. Then for each partition run the following. riak_search_vnode:repair(P) Make sure to run repair for only one partition at a time to avoid overloading anything. To determine when a repair is finished you can periodically call the following. Once it returns 'no_repair' that indicates it has finished. riak_search_vnode:repair_status(P) Here is more information on the repair command. http://docs.basho.com/riak/latest/cookbooks/Repairing-Search-Indexes/ -Z On Tue, Aug 6, 2013 at 5:17 AM, Deyan Dyankov dyan...@cloudxcel.com wrote: hi, we have a 3 node cluster and one of the node crashed yesterday. Nodes are db1, db2 and db3. We started other services on db1 and db2 and db1 crashed. Currently db2 and db3 are fine, balanced, receiving writes and serving reads. However, db1 has issues starting. When I start the node, it outputs numerous errors and this finally results in a core dump. We use Riak search and this may be the reason for the dump. After starting the node, these are the first errors that are seen in the log file: […] 2013-08-06 11:06:08.989 [info] 0.7.0 Application erlydtl started on node 'r...@db1.locations.cxl-cdn.net' *2013-08-06 11:06:16.675 [warning] 0.5010.0 Corrupted posting detected in /storage/riak/merge_index/456719261665907161938651510223838443642478919680/buffer.598 after reading 2281* *49 bytes, ignoring remainder.* 2013-08-06 11:06:18.922 [error] 0.5310.0 CRASH REPORT Process 0.5310.0 with 0 neighbours exited with reason: bad argument in call to erlang:binary_to_term(131,108,0,0,0,1,10 4,4,104,3,109,0,0,0,25,99,120,108,101,118,101,110,116,115,95,99,97,107,101,...) in mi_buffer:read_value/2 line 162 in gen_server:init_it/6 line 328 2013-08-06 11:06:20.751 [error] 0.5309.0 gen_fsm 0.5309.0 in state started terminated with reason: no function clause matching riak_search_vnode:terminate({{badmatch,{error,{b adarg,[{erlang,binary_to_term,[131,108,0,0,0,1,104,4,104,3,109,0,0,0,25,...],...},...]}}},...}, undefined) line 233 […] Attached is an archive of the /var/log/riak directory. The logs there are for the latest starting attempt. Riak core dumped in a minute or two after being started. Is there a way to fix the merge index corruption and start the node? thank you for your efforts, Deyan ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[ANN] Yokozuna 0.8.0
Riak Users, The eighth release of Yokozuna is out. It is now considered alpha and will soon become part of Riak proper. There could still be breaking changes leading up to the 1.0.0 release which is currently scheduled for early October. The main things of interest this release are the re-target to Riak 1.4.0 and removal of a race condition around index creation. See the release notes for more detail. https://github.com/basho/yokozuna/blob/master/docs/RELEASE_NOTES.md#080 Here are install instructions. https://github.com/basho/yokozuna/blob/master/docs/INSTALL.md#source-package -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Data population of Yokozuna on key-path in schema?
As Eric said, the XML extractor causes the nested elements to become concatenated by an underscore. Extractor is a Yokozuna term. It is the process by which a Riak Object is mapped to a Solr document. In the case of a Riak Object whose value is XML the XML is flattened by a) concatenating nested elements with '_' and b) concatenating attributes with '@' (this can be changed if necessary, just ask). Yokozuna provides a resource to test how a given object would be extracted. curl -X PUT -i -h 'content-type: application/xml' 'http://host:port/extract' --data-binary @some.xml This will return a JSON representation of the field-values extracted from the object. You can use a json pretty printer like jsonpp to make it easier to read. -Z On Wed, Jul 17, 2013 at 8:51 PM, Eric Redmond eredm...@basho.com wrote: That's correct. The XML extractor nests by element name, separating elements by an underscore. Eric On Jul 17, 2013, at 12:46 PM, Dave Martorana d...@flyclops.com wrote: Hi, I realize I may be way off-base, but I noticed the following slide in Ryan’s recent Ricon East talk on Yokozuna: http://cl.ly/image/3s1b1v2w2x12 Does the schema pick out values based on key-path automatically? For instance, commitrepoval/repo.../commit automatically gets mapped to the “commit_repo field definition for the schema? Thanks! Dave ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna kv write timeouts on 1.4 (yz-merge-1.4.0)
Dave, I'm currently in the process re-targeting Yokozuna to 1.4.0 for the 0.8.0 release. I'll ping this thread when the transition is complete. -Z On Wed, Jul 17, 2013 at 8:53 PM, Eric Redmond eredm...@basho.com wrote: Dave, Your initial line was correct. Yokozuna is not yet compatible with 1.4. Eric On Jul 15, 2013, at 1:00 PM, Dave Martorana d...@flyclops.com wrote: Hi everyone. First post, if I leave anything out just let me know. I have been using vagrant in testing Yokozuna with 1.3.0 (the official 0.7.0 “release) and it runs swimmingly. When 1.4 was released and someone pointed me to the YZ integration branch, I decided to give it a go. I realize that YZ probably doesn’t support 1.4 yet, but here are my experiences. - Installs fine - Using default stagedevrel with 5 node setup - Without yz enabled in app.config, kv accepts writes and reads - With yz enabled on dev1 and nowhere else, kv accepts writes and reads, creates yz index, associates index with bucket, does not index content - With yz enabled on 4/5 nodes, kv stops accepting writes (timeout) Ex: (env)➜ curl -v -H 'content-type: text/plain' -XPUT ' http://localhost:10018/buckets/players/keys/name' -d Ryan Zezeski * Adding handle: conn: 0x7f995a804000 * Adding handle: send: 0 * Adding handle: recv: 0 * Curl_addHandleToPipeline: length: 1 * - Conn 0 (0x7f995a804000) send_pipe: 1, recv_pipe: 0 * About to connect() to localhost port 10018 (#0) * Trying 127.0.0.1... * Connected to localhost (127.0.0.1) port 10018 (#0) PUT /buckets/players/keys/name HTTP/1.1 User-Agent: curl/7.30.0 Host: localhost:10018 Accept: */* content-type: text/plain Content-Length: 12 * upload completely sent off: 12 out of 12 bytes HTTP/1.1 503 Service Unavailable Vary: Accept-Encoding * Server MochiWeb/1.1 WebMachine/1.9.2 (someone had painted it blue) is not blacklisted Server: MochiWeb/1.1 WebMachine/1.9.2 (someone had painted it blue) Date: Mon, 15 Jul 2013 19:54:50 GMT Content-Type: text/plain Content-Length: 18 request timed out * Connection #0 to host localhost left intact Here are my Vagrant file: https://gist.github.com/themartorana/460a52bb3f840010ecde and build script for the server: https://gist.github.com/themartorana/e2e0126c01b8ef01cc53 Hope this helps. Dave ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search and Sorting
Jeremiah, Sorting is broken in protobuffs currently. Unfortunately the fix got lost in the cracks. https://github.com/basho/riak_search/pull/136 -Z On Thu, Jul 18, 2013 at 10:11 AM, Jeremiah Peschka jeremiah.pesc...@gmail.com wrote: I just confirmed that today I'm getting the correct sorting in the browser but not in CorrugatedIron. I'm about to start in on a day of working with a client. Will verify this afternoon. --- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop On Thu, Jul 18, 2013 at 6:55 AM, Ryan Zezeski rzeze...@basho.com wrote: Jeremiah, After a quick glance I don't see anything obvious in the code. I notice you have a presort defined. By any chance, if you remove the presort, do you get a correct sorting on the creation_dt field? -Z On Wed, Jul 17, 2013 at 5:30 PM, Jeremiah Peschka jeremiah.pesc...@gmail.com wrote: I'm attempting to sort data with Riak Search and have run into a distinct lack of sorting. When using curl (The Fullest Featurest Riak Client EVAR™), I query the following URL: http://localhost:10038/solr/posts/select?q=title_txt:googlepresort=keysort=creation_dtrows=500 Being aware that results are sorted AFTER filtering on the server side, I adjusted my query to accept too many rows: there are 335 rows that meet my query criteria. However, Riak Search returns 10 sorted by some random criteria that I'm not aware of (it's not score, that's for sure). Is this behavior expected? Is there something that I've missed in my query? --- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Data population of Yokozuna on key-path in schema?
Yes, it has similar rules. Nested objects have their fields joined by '_'. Arrays become repetitive field names, which should map to a multi-valued field. You can use the URL I provided in the last response to see exactly how field-values are extracted. On Thu, Jul 18, 2013 at 12:16 PM, Dave Martorana d...@flyclops.com wrote: Does the JSON extractor work in a similar fashion, or does it follow its own rules? We don’t use XML anywhere (but JSON everywhere). Thanks! Dave On Thu, Jul 18, 2013 at 9:31 AM, Ryan Zezeski rzeze...@basho.com wrote: As Eric said, the XML extractor causes the nested elements to become concatenated by an underscore. Extractor is a Yokozuna term. It is the process by which a Riak Object is mapped to a Solr document. In the case of a Riak Object whose value is XML the XML is flattened by a) concatenating nested elements with '_' and b) concatenating attributes with '@' (this can be changed if necessary, just ask). Yokozuna provides a resource to test how a given object would be extracted. curl -X PUT -i -h 'content-type: application/xml' 'http://host:port/extract' --data-binary @some.xml This will return a JSON representation of the field-values extracted from the object. You can use a json pretty printer like jsonpp to make it easier to read. -Z On Wed, Jul 17, 2013 at 8:51 PM, Eric Redmond eredm...@basho.com wrote: That's correct. The XML extractor nests by element name, separating elements by an underscore. Eric On Jul 17, 2013, at 12:46 PM, Dave Martorana d...@flyclops.com wrote: Hi, I realize I may be way off-base, but I noticed the following slide in Ryan’s recent Ricon East talk on Yokozuna: http://cl.ly/image/3s1b1v2w2x12 Does the schema pick out values based on key-path automatically? For instance, commitrepoval/repo.../commit automatically gets mapped to the “commit_repo field definition for the schema? Thanks! Dave ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[ANN] Yokozuna 0.7.0
Riak Users, Today I'm excited to bring you the 0.7.0 release of Yokozuna. It includes some new features such as an upgrade to Solr 4.3.0, isolation of index failures, one-to-many index-to-buckets relationship, and map-reduce support. There is also a performance improvement in index throughput. Along with several bug fixes. See the release notes for more detail. https://github.com/basho/yokozuna/blob/master/docs/RELEASE_NOTES.md#070 Once again I have forgone the EC2 AMI. Only a source package is available. You can find instructions for installing on the INSTALL page. https://github.com/basho/yokozuna/blob/master/docs/INSTALL.md For those that have been using Yokozuna. The one-to-many change is a breaking change. Creating an index no longer implicitly indexes the bucket with the same name. Two steps must be performed. FIrst you create the index as before. Second you add a bucket property 'yz_index' whose value is the name of the index you wish to index that bucket under. There is an example in the README. https://github.com/basho/yokozuna#creating-an-index -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: [ANN] Yokozuna 0.7.0
Yokozuna supports protobuff already. It uses the same protocol as current Riak Search so it is currently limited to that feature set. It should just work. However, currently, if both Riak Search and Yokozuna are enabled then Riak Search will handle all queries. Are you asking in regards to CorrugatedIron by chance? -Z On Mon, Jul 1, 2013 at 12:29 PM, Jeremiah Peschka jeremiah.pesc...@gmail.com wrote: What level of PBC integration can we expect from Yokozuna? Is that developed but not documented or is that a TBA feature? --- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop On Mon, Jul 1, 2013 at 8:46 AM, Ryan Zezeski rzeze...@basho.com wrote: Riak Users, Today I'm excited to bring you the 0.7.0 release of Yokozuna. It includes some new features such as an upgrade to Solr 4.3.0, isolation of index failures, one-to-many index-to-buckets relationship, and map-reduce support. There is also a performance improvement in index throughput. Along with several bug fixes. See the release notes for more detail. https://github.com/basho/yokozuna/blob/master/docs/RELEASE_NOTES.md#070 Once again I have forgone the EC2 AMI. Only a source package is available. You can find instructions for installing on the INSTALL page. https://github.com/basho/yokozuna/blob/master/docs/INSTALL.md For those that have been using Yokozuna. The one-to-many change is a breaking change. Creating an index no longer implicitly indexes the bucket with the same name. Two steps must be performed. FIrst you create the index as before. Second you add a bucket property 'yz_index' whose value is the name of the index you wish to index that bucket under. There is an example in the README. https://github.com/basho/yokozuna#creating-an-index -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Question: Riak search and wildcard
Hi Otto I will propably make a config file in my app or some temporary variable which will contain 10 keys which I get with a map/reduce than is run daily, and then I'll fetch the predefined set from Riak when I need the 10 first results. Although this will require 10 requests to get 10 results, search would have been ideal since one request can return a big set of results.. It is a pitty the search feature does not support q=* as query. There are reasons for why this is not implemented in Riak Search. For one, it would be massively expensive as it requires iterating through all inverted indexes on a covering set of partitions and building up the entire list of matching keys (i.e. all keys) in memory on the coordinating node. The solution which will replace Riak Search, Yokozuna [1], can perform this operation just fine. But you will need to store the fields if you wish to get their values back in the query result, otherwise you still need 11 operations, 1 for the query, 10 to get the values (or use map/reduce as a multiget). However, since there is nothing to score on, your 10 results are effectively random (or perhaps it would fall back to index order). So I'm not sure I follow you when you say 10 first results. What is frist in relation to? -Z [1]: https://github.com/basho/yokozuna ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[ANN] Yokozuna 0.6.0
Riak Users, Today I'm pleased to announce the 0.6.0 release of Yokozuna. Two highlights of this release are: 1. Initial protobuff support at parity with Riak Search. This means that existing Riak clients which have Riak Search/PB support should now be able to query Yokozuna. Please note that, currently, having both Riak Search and Yokozuna enabled will cause issues. That will be addressed soon in order to allow migrations in the future. 2. A 30-40% performance in query throughput thanks to caching of coverage plans. The improvement will vary on workload. It will be most noticeable for slow CPU or query results coming from Solr cache. This is because the patch removes a lot of CPU work during the query from the Yokozuna side. There are a slew of other changes. See the release notes for more detail [1]. I decided to forgo the EC2 version for this release. The base AMI is starting to get long in the tooth and I'm not sure if anyone is actually making use of the Yokozuna AMI. If you need it please ping me via email and I'll be sure to build an 0.6.0 AMI. I've also added updated instructions for installing Riak-Yokozuna. The preferred method now is to use the source package. See the INSTALL doc for more details [2]. [1]: https://github.com/basho/yokozuna/blob/master/docs/RELEASE_NOTES.md#060 [2]: https://github.com/basho/yokozuna/blob/master/docs/INSTALL.md ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Deleting from search index but not from bucket
Is it possible to delete an object from the search index without deleting it from the bucket? I'm using Erlang pb client. It is possible but not a first-class operation and certainly not supported via the PB client. It would require custom erlang code. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna max practical bucket limit
Elias, I could see that being something some folks want. From my point of view, I find that the existing design of one core per bucket may be more useful, so long as I can search across cores with similar schemas (I created an issuehttps://github.com/basho/yokozuna/issues/87to track that feature), as it allows me to easily drop the index for a bucket. In a multi-tenant environment, where you may have an index per customer, this is rather useful. A lot less painful than trying to delete the index (and data) by performing a key listing and delete operations. Well you still can't avoid the key-listing/delete for Riak itself. For Solr this would be a delete-by-query which isn't nearly as expensive. As I've expressed before, I wish buckets behaved the same way, segregating their data into distinct backend, but I understand that this results in lower resource usage, as things like LevelDB caches would then not be shared and you'd need additional file descriptors. At the very least, it would be great if backend instances could be created programmatically through the HTTP or PB API, rather than having to modify app.config and perform a rolling restart. That not very operationally friendly. Yes, there are benefits to be had both ways. Segregating the actual backend instances allows for efficient drop of entire bucket, but adds strain in terms of file descriptors and I/O contention. Multi-backend sorta helps but is static in nature as you mention. As for large number of cores, I could see some folks creating many of them. Buckets are relatively cheap, since by default they are all stored in the default backend instance. Their only cost being the additional network traffic for gossiping non-default bucket properties. So folks create them freely. Once Yokozuna is better documented, it should be pointed out that the same is not true of a bucket's index, since they create one core per bucket. So an indexed bucket has quite a bit more static overhead than non-indexed one. Good point. If you use Riak and have 300 customers, you can easily create a bucket per customer, even if you only have 64 partions and are using Riak Search on all of them, as Search stores all the data in the same merge index backend. You may want to twice before upgrading such cluster to Yokozuna. Well, Riak Search will have issues as well. First, each bucket will require a pre-commit hook to be installed which means custom bucket properties to be copied into the ring. There is a known drawback with Riak where many bucket properties greatly reduce ring gossip throughput and can cause issues. I believe Joseph Blomstedt may have some patches going into the next release that will improve this but ultimately we need to get bucket properties out of the ring. Even if that is solved, Riak Search will have other tradeoffs such as substantially reduced feature support compared to Yokozuna as well as reduced performance for many types of queries. But I do agree many indexes (thus cores) could pose a problem for Yokozuna. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: expected_binaries error in search
Rob, I reproduced this at the command line. Here I'm storing two documents, with IDs 'doc8' and 'doc9', into a search-enabled bucket named 'test-search'. # This command works, even though 'lsh' is empty. I believe this is because I've never put a field named 'lsh' in this bucket, 'test-search'. curl -v -XPUT -H Content-Type: application/json -d '{terms: empty|en string|en test|en, lsh: , segments: d do doc docs}' ' http://riak.lumi:8098/riak/test-search/doc8?returnbody=true' # However, if I use the empty string for a field that has ever been indexed before, *then* I get a crash. curl -v -XPUT -H Content-Type: application/json -d '{terms: , segments: d do doc docs}' ' http://riak.lumi:8098/riak/test-search/doc9?returnbody=true' I copy/pasted your commands and could not reproduce. Both docs indexed correctly and were returned when running the following search. However, the reason this works is because it's going through KV and using the commit hook to index. The error you originally pasted is from data being indexed via the Riak Search Solr endpoint. They are two different things. As it turns out there is a bug(?) in the Solr end-point. It doesn't like empty fields. E.g. if I send the following XML I can reproduce the error. --- ?xml version=1.0 encoding=UTF-8? add doc iddocA/id field name=termsempty|en string|en test|en/field field name=lsh/field field name=segmentsd do doc docs/field /doc /add --- To confirm, here's what I see in the log: 2013-04-22 17:02:56.394 [error] 0.4452.0@riak_solr_indexer_wm:malformed_request:37 Unable to parse request: {expected_binaries,lsh,[]} I filed an issue: https://github.com/basho/riak_search/issues/141 As requested, I typed that redbug command into the Erlang console, and at the moment of the crash I get some output. The output baffles me, though. The included ID is in the style of ID we use for actual documents, which are stored in totally different buckets, and which I didn't refer to at all in my testing. My guess is you tested this on the system while other writes were coming in. This is what redbug is picking up. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Active Anti Entropy with Bitcask Key Expiry
Ben, AAE should not resurrect keys when bitcask expiry is enabled. However, a non-trivial amount of work may be performed if a lot of keys expire all at once. You're correct that the layers above bitcask have no notion of expiry. When a key expires no notification is sent to Riak. This means that hashtrees (which I'll call trees from here on out) will continue storing an entry for a key after it has expired. As long as all trees agree that the key is still there AAE will be none the wiser about expiry. However, AAE has its own notion of expiry. Every tree has en expiration date at which point is is discarded and rebuilt from scratch based on the data in the backend. By default trees expire after a week. This means there could be a window where the trees disagree because some were rebuilt and no longer include the expired key. At this point AAE will try to repair the data by invoking a read-repair. Since bitcask honors expiry on 'get' all N copies will return not_found and thus read-repair will do nothing. Then AAE will send a 'rehash' request to all N replicas [1] [2]. The rehash will notice the key is no longer and delete it from the tree. So, keys should not be resurrected, but it could generate additional I/O proportional to the number of keys expired. For example: 1. bitcask expiry is set to 1 day 2. millions of keys are written in hour time span thus every hour millions of keys expire 3. the same key is never overwritten inside a weeks time 4. AAE is using default tree expiry of a week 5. the trees for a given preflist are _not_ all expired at about the same time In this scenario, when a tree expires it may have millions of expired keys to deal with. This means millions of Riak 'get' calls plus millions of 'rehash' calls. Now, since the rehash operation is sent to all replicas only 1 tree of a preflist needs to expire for all replica trees to be repaired. This means the maximum number of times you should take this hit is Q / N where Q = ring size, N = n_val. Point #3, #4, #5 really are the key here. There must be an overlap where keys are expired and only a subset of a preflist's trees have been rebuilt. The more often keys are re-written and the more nodes you have the less likely it will be to hit this window. -Z [1] https://github.com/basho/riak_kv/blob/master/src/riak_kv_exchange_fsm.erl#L232 [2] https://github.com/basho/riak_kv/blob/master/src/riak_kv_vnode.erl#L482 On Tue, Apr 16, 2013 at 11:07 AM, Ben Murphy benmmur...@gmail.com wrote: Does anyone know if these two place nice with each other? As far as I can see the higher layers sitting on top of bitcask are not aware that bitcask can expire keys. Would the anti-entropy code try to resurrect expired keys? ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: expected_binaries error in search
Rob, That cryptic error is saying that it expected a binary value type for the field 'ish' (binary is a data structure in Erlang) but instead got an empty list. Do you by any chance have the exact data which is causing the issue? If you can isolate the data causing the problems then attach to the riak console and run the following. redbug:start(riak_solr_xml_xform:xform/1 - return). Then in another window try to index the data. Copy the riak console output and mail it to me. My guess is something is getting parsed incorrectly. -Z On Tue, Apr 9, 2013 at 12:49 PM, Rob Speer r...@luminoso.com wrote: We're having problems where Riak nodes stop responding to requests, sometimes while trying to add documents and sometimes while trying to delete them. There are a lot of errors in the logs on all machines, and we're wondering if this has something to do with it. A message like this appears every 1-12 minutes: 2013-04-09 11:47:52.955 [error] 0.29725.18@riak_solr_indexer_wm:malformed_request:37 Unable to parse request: {expected_binaries,lsh,[]} lsh is a field on the data structures we're indexing (it contains arbitrary tokens generated for locality-sensitive hashing). Here's an example of what we might be telling Riak Search to index. (It's intentional that we're using the whitespace analyzer on all fields.) { 'id': 'uuid-1b34a5a7d5894e1f92874066d074ecec', 'subsets': '__all__ subset1', 'terms': 'example|en text|en', 'lsh': 'ANRW BMkA CHyu DN60', 'segments': '1 1b 1b3 1b34' } This would get sent through self.riak.solr.add() in the Riak Python client, of which we're using the latest version committed to master (1a379dc1), via the Protocol Buffers transport. It is possible to store a document that is missing 'terms' or 'lsh'; is Riak complaining about their absence when it throws an expected_binaries error? Would this be causing Riak to stop responding to its client connections? ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: AAE and changing the replication factor
Elias, Setting the n_val higher should add the missing replicas. However, setting lower will currently leave the extra replicas alone. This was chosen to err on the side of caution for now. This would be easy enough to verify with a riak_test [1] but I don't think we have one at the moment. -Z [1]: https://github.com/basho/riak_test On Mon, Apr 8, 2013 at 8:08 PM, Elias Levy fearsome.lucid...@gmail.comwrote: I am wondering if AAE means that we can now change the replication factor of a Riak bucket and have the additional missing replicas be created by AAE, rather than having to reinsert all the data in the bucket. Elias Levy ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Post commit hooks are a single process, so they are executed in the same order as the commits ?
Simon, On Mon, Apr 8, 2013 at 7:14 PM, Simon Majou si...@majou.org wrote: Hello, I want to sync a bucket of a first cluster with the bucket of a second cluster. To do that I think using the post commit hook. If you didn't know, this is exactly what Riak Enterprise was built to do. I.e. handle multi-cluster replication. However, if you want to give it a go on your own a post-commit hook is one way to get the job done. You'll want to think through failure scenarios where the receiving cluster is down and how to deal with msgs that are dropped between clusters. The post-commit hook runs on a process called the coordinator, there is a coordinator for every incoming request. So you won't block the vnodes, which is important, but the client/user request will block until your post-commit returns. Is there any risk that the sequence of PUTs to be mixed in such a scenario ? Do you mean the sequence seen on cluster A vs. cluster B? Are you asking if the object could appear to be on B before A even though the PUT was sent to A? The answer is, it depends. With a healthy system it's probably unlikely but it will depend on your DW values and state of each cluster. E.g. if cluster A nodes get slow disk I/O then perhaps the replication to cluster B could beat writes on A. If we start introducing node and network failures, or changing W/DW values then things can get more complicated. You could have success on cluster A, fire replica to cluster B, all primary nodes for that object on cluster A die, now cluster B will have a key for which cluster A says not_found (well, not totally true, depends on your PR value). -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Yokozuna max practical bucket limit
Elias, This is exactly why I chose not to make a core per partition. My gut feeling was that most users are likely to have more partitions than indexed buckets. I don't know the overhead per-core or what the limits might be. I would recommend the Solr mailing list for questions like that. I've also looked at that LotsOfCores page before. One benefit to using Solr is that any improvements made to it should also trickle down to Yokozuna. That said, I still plan to allow a one-to-many mapping from index to buckets. That would allow many KV buckets to index under the same core. I have an idea of how to implement it. I'm fairly certain it would work just fine. I just need to add a GitHub issue and then it's a simple matter of coding. -Z On Mon, Apr 8, 2013 at 6:25 PM, Elias Levy fearsome.lucid...@gmail.comwrote: Thinking about Yokozuna it would appear that for some set of hardware specs there must be some maximum practical number of indexed buckets. Yokozuna creates one Solr core per bucket per node. Scaling the Riak cluster will reduce the amount of data indexed per core, but not the number of cores node. I assume there is some static overhead per Solr core, and thus a maximum number of indexed buckets per cluster based on the per node resources. Any idea what this may be be, roughly? Has anyone tried to max out the number of indexed buckets? Searching the Solr mailing list it seems some folks have up to 800 cores per slave, but their hardware is unknown and queries are being served by slaves, so the cores are only indexing. It looks like there is some ongoing work in Solr to support large number of cores by dynamically loading and unloading them ( http://wiki.apache.org/solr/LotsOfCores). Is this something Yokozuna may make use of? It may be to expensive a hit for latencies. Elias Levy ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[ANN] Yokozuna 0.5.0
Riak Users, Today I'm pleased to announce the 0.5.0 release of Yokozuna. This release includes a bit of everything. New features, bug fixes, an upgrade to Solr 4.2.0, and search performance improvement. See the full release notes for more detail. Thank you to @timdoug and @kyleslattery for their contributions. Release notes: https://github.com/basho/yokozuna/blob/master/docs/RELEASE_NOTES.md#050 EC2 Deployment: https://github.com/basho/yokozuna/blob/5a62fde0a9d79f9ae392922567aadadd47094b53/docs/EC2.md Source Package: https://s3.amazonaws.com/yzami/pkgs/src/riak-yokozuna-0.5.0-src.tar.gz -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Set the default search schema analyzer_factory to 'standard_analyzer_factory' for all future buckets
On Mon, Mar 4, 2013 at 9:36 AM, vvsanil vvsanilku...@gmail.com wrote: Is there anyway to set the default search schema analyzer_factory to 'standard_analyzer_factory' for all future buckets (i.e. without having to manually set schema each time a new bucket is created) ? Yes, look for the file `default.def` under your lib dir where `riak_search/priv` lives. That file is used as the default schema when one is not explicitly created. N.B. you must update this file on _EVERY_ node or you may get unexpected results. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search cannot detect element with ''
Tony, Riak Search is treating the '' the same as 'AND'. If you encode it as %5C%26 it should work. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: How should I avoid words that are effectively stopwords in Riak Search?
Rob, Riak Search doesn't have a traditional term-frequency count. It has something similar but it's an estimate and it is much more expensive than a simple table lookup. Even if it did have term-frequency it doesn't really expose it to the outside world. Not only that, but the standard analyzer provides no way to specify additional stop words. You'd have to keep track of this data externally and do some pre-processing to remove stopwords before. For the last 9 months I've been working on a project called Yokozuna with the goal to replace Riak Search [1]. It's like Riak Search except much better because the underlying engine is actually Solr/Lucene, not an inferior clone written in Erlang. In that case you could add new stopwords, exploit query caching, and use newer features like LUCENE-4628 [2] to help combat high frequency terms. You'd also have an easy way to get frequency count for a given term to determine if you should make it a stopword. [1] https://github.com/basho/yokozuna [2] https://issues.apache.org/jira/browse/LUCENE-4628 On Fri, Mar 22, 2013 at 2:21 PM, Rob Speer r...@luminoso.com wrote: My company is starting to use Riak for document storage. I'm pretty happy about how it has been working so far, but I see the messages of foreboding and doom out there about Riak Search and I've encountered a problem myself. I can't really avoid using Riak Search, as full text indexing is a key feature we need to provide. If Riak Search is suboptimal, so is basically every other text index out there. We've just been burned by ElasticSearch's ineffective load balancing (who would have guessed, consistent hashing is kind of important). I know that performing searches in Riak Search that return many thousands of documents is discouraged for performance reasons, and the developers encourage removing stopwords to help with this. There's additionally, I have seen, a hard limit on the number of documents that can be examined by a search query; if any term matches more than 100,000 documents, the query will return a too_many_results error (and, incidentally, things will get so confused that, in the Python client, the *next* query will also fail with an HTTP error 400). The question is, what should I actually do to avoid this case? I've already removed the usual stopwords, but any particular set of documents might have its own personal stopwords. For example, in a database of millions of hotel reviews, the word 'hotel' could easily appear in more than 100,000 documents. If we need to search for '5-star hotel', it's wasteful and probably crash-prone to retrieve all the 'hotel' results. What I'd really like to do is just search for '5-star', which because of IDF scoring will have about the same effect. That requires knowing somehow that the word 'hotel' appears in too many documents. Is there a way to determine, via Riak, which terms are overused so I can remove them from search queries? Or do I need to keep track of this entirely on the client end so I can avoid searching for those terms? Thanks, -- Rob Speer ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[ANN] Yokozuna 0.4.0
Hello Riak Users, Today I'm pleased to announce the 0.4.0 release of Yokozuna. This is a small release in terms of features added, as there are no new features, but an important release for reasons enumerated below. * Performance improvements to Solr's distributed search thus improving performance of Yokozuna queries [1] [2] [3]. * This release is based off Riak 1.3.0. This release is essentially Riak 1.3.0 with the Yokozuna bits added to it. * Yokozuna has moved from my personal GitHub account into the Basho organization. The prototype status is still in effect but this is a very important step towards the goal of merging Yokozuna into Riak proper. release notes: https://github.com/basho/yokozuna/blob/v0.4.0/docs/RELEASE_NOTES.md instructions to deploy on ec2: https://github.com/basho/yokozuna/blob/c3a1cad34f65f1f5f1d416f3f25b2ab5254a583a/docs/EC2.md source package: http://s3.amazonaws.com/yzami/pkgs/src/riak-yokozuna-0.4.0-src.tar.gz -Z [1] Yokozuna pull-request: https://github.com/basho/yokozuna/pull/26 [2] Upstream patch to Solr: https://issues.apache.org/jira/browse/SOLR-4509 [3] I discuss this change in depth: http://www.zinascii.com/2013/solr-distributed-search-and-the-stale-check.html ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: About the node show down when occur riak_searc error of too_many_resut
Jason, Riak Search has a limit of 100k results at which point it halts processing by throwing an exception. It does this to protect itself from having to build an indeterminately sized list and then sort it. You can raise this limit but you might start seeing large process heaps using lots of CPU for GC or sorting. I'm having a bit of trouble understanding your second point. Are you saying that the node goes down after this error? The only reason I see that happening is if you run this query (or others that also match large results) many times in succession causing max restart events (I'm referring to Erlang/OTP supervisor/worker restarts) to occur eventually reaching up to the root supervisor and thus exiting. -Z On Tue, Jan 22, 2013 at 5:16 AM, 郎咸武 langxian...@gmail.com wrote: Hi all The default value of 100,000 can be custom tuned with the max_search_results setting in the etc/app.config file. I using the default value. There are 1,000,000 K/V. I only invoke riakc_pb_socket:search(Pid, Bucket, name:u1*)[1] when the node shutdown[2]. The error is an obvious out of the default value. But the node is showdown. This is really out of my expectation. Is this a bug? Who can give me some advice? Cheers ,Jason The enviroment: Erlang R14B04 (erts-5.8.5) FreeBSD meda082 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan 3 07:46:30 UTC 2012 riak1.2 riakclint 1.3 [1]{error,Error processing incoming message: throw:{too_many_results,\n {scope...} [2] (riak@127.0.0.1)1 17:03:55.626 [error] gen_server 0.2736.0 terminated with reason: {throw,{too_many_results,{scope,#Ref0.0.0.8359,test_riak_json ,value,{scope,#Ref0.0.0.8358,undefined,name,{range_sized,#Ref0.0.0.8362,{inclusive,u1},{inclusive,u1\377},all,undefined,[{ria k_search_client,'-search/7-fun-0-',4},{riak_search_client,fold_results,5},{riak_search_client,search,8},{riak_search_client,search_doc,8},{riak_search _utils,run_query,7},{riak_search_pb_query,run_query,7},{riak_search_pb_query,process,2},{riak_api_pb_server,process_message,4}]}^M^M 17:03:55.643 [error] CRASH REPORT Process 0.2736.0 with 1 neighbours exited with reason: {throw,{too_many_results,{scope,#Ref0.0.0.8359,test_riak _json,value,{scope,#Ref0.0.0.8358,undefined,name,{range_sized,#Ref0.0.0.8362,{inclusive,u1},{inclusive,u1\377},all,undefined, [{riak_search_client,'-search/7-fun-0-',4},{riak_search_client,fold_results,5},{riak_search_client,search,8},{riak_search_client,search_doc,8},{riak_s earch_utils,run_query,7},{riak_search_pb_query,run_query,7},{riak_search_pb_query,process,2},{riak_api_pb_server,process_message,4}]} in gen_server:te rminate/6^M^M -- 只为成功找方法,不为失败找理由 ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[ANN] Yokozuna 0.3.0
Riak Users, Today I'm happy to announce the 3rd pre-release of Yokozuna. It's light on new features but has some good performance improvements and added robustness. Here are the highlights: * Allow store/retrieval of schemas via HTTP. * Upgrade to Solr 4.1.0 and the latest Riak. * Improve write/index throughput by disabling Solr's realtime get and switching from XML update to JSON. * Added robustness around AAE and default index creation. * Listen on 'solr/index/select' to more easily work with existing clients out of the box. To see all changes read the full release notes [1]. Like the last two releases, an AMI has been made, see the EC2 doc for more info [2]. New for this release is the addition of a source package. I hope this might encourage those who are scared off by the process of building from git to give Riak/Yokozuna a try. These four steps below will produce a ready-to-run node under 'rel/riak' [3]. wget http://s3.amazonaws.com/yzami/pkgs/src/riak-yokozuna-0.3.0-src.tar.gz tar zxvf riak-yokozuna-0.3.0-src.tar.gz cd riak-yokozuna-0.3.0-src make stage [1]: https://github.com/rzezeski/yokozuna/blob/v0.3.0/docs/RELEASE_NOTES.md [2]: https://github.com/rzezeski/yokozuna/blob/v0.3.0/docs/EC2.md [3]: You may want to change some configuration first: http://docs.basho.com/riak/1.2.1/cookbooks/Basic-Cluster-Setup/ -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search on KV data in erlang terms
Takeshi, The erlang extractors don't work when writing the data via HTTP as they store the data as a binary. I.e. [{name, bob}] becomes [{\name\, \bob\}] instead of the proplist you expect. This is because the extractor assumes the object data is already properly decoded. It is possible to write erlang terms via the erlang client, but I wouldn't recommend it as it is not the common case and I feel like there are other issues lurking in Riak if you do this. If there a particular reason you are trying to store erlang terms? Are you worried about space? I would just stick with JSON or XML if that is acceptable. -Z On Tue, Jan 22, 2013 at 10:15 AM, Takeshi Matsumura takeshi4...@gmail.comwrote: Hi, I tried to store erlang data and query them by using the Riak Search without success so far, and thus would like to ask if I'm doing the right thing. Riak Search was enabled in the app.config file and the server was restarted. The pre-commit hook was installed from the command line. bin/search-cmd install mybucket The erlang data that I uploaded is a proplists with a single pair of key and value. [{name, bob}] It was uploaded by using the curl command with Content-Type application/x-erlang. (hoge.erl.txt contains the above erlang terms). curl -v -d @hoge.erl.txt -X PUT -H content-type: application/x-erlang http://localhost:8098/riak/mybucket/bob; I could get the document by issuing curl command to /riak/mybucket/bob. The HTTP response header contained the correct Content-Type, application/x-erlang. Then I run the Riak Search from the command line. bin/search-cmd search mybucket name:bob Unfortunately, the result said Found 0 results. As I wondered if this is a problem related to the erlang terms, I tried the same with a JSON data that is found in the Indexing and Querying KV Data page but setting application/json to the Content-type. { name:Alyssa P. Hacker, bio:I'm an engineer, making awesome things., favorites:{ book:The Moon is a Harsh Mistress, album:Magical Mystery Tour } } Then the Riak Search could find this document with the query name:Alyssa*. According to the Indexing and Querying KV Data page of the Riak document, erlang terms can be queried by using the Riak Search. However, it is unclear to me if it is enabled by default because the page says XML, JSON, and plain-text encodings are supported out of the box but it doesn't mention the erlang terms. I followed the Other Data Encodings section and set the riak_search_kv_erlang_extractor module, but it didm't change the situation. curl -XPUT -H 'content-type: application/json' \ http://localhost:8098/riak/mybucket \ -d '{props:{search_extractor:{mod:riak_search_kv_erlang_extractor, fun:extract, arg:my_arg}}}' I changed the data as follows and uploaded but it didn't help either. [{name, bob}] I appreciate any hints for me to go forward. Thank you in advance. Best regards, Takeshi ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Yokozuna 0.2.0 [ANN]
Hello Riak Users, For those of you who missed the announcement on Monday, Yokozuna 0.2.0 has been released. This release includes two big features that I discussed in my RICON talk [1] but hadn't yet been completed. 1) Active Anti-Entropy (AAE): This is a process that constantly verifies that the data and it's corresponding indexes are in-sync. This is done in the background, in an efficient manner, that should require no intervention on the user's part. 2) Sibling Support: If allow_mult is set to true, meaning you want Riak to store siblings when there is a conflict, then Yokozuna will index all siblings. If a search matches any sibling of an object then it will be included as a result. When an object's siblings are reconciled to one version then all sibling indexes will be deleted. To see a full list of changes checkout the Basho blog post linked below or go directly to the 0.2.0 release notes if you prefer [2]. http://basho.com/blog/technical/2012/12/31/yokozuna-pre-release-0.2.0-now-available/ -Z [1]: http://vimeo.com/54266574 [2]: https://github.com/rzezeski/yokozuna/blob/master/docs/RELEASE_NOTES.md#020 ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak search query timeout issues with 1.2.1 stable
On Thu, Dec 20, 2012 at 9:51 AM, Abhinav Singh abhinavsi...@ymail.comwrote: error.log on riak@172.17.3.82 contains: 2012-12-20 16:27:37.877 [error] 0.1821.0@mi_server:handle_info:524 lookup/range failure: {{badfun,#Funriak_search_client.9.56347389},[{mi_server,iterate,6},{mi_server,lookup,8}]} 2012-12-20 16:27:37.878 [error] emulator Error in process 0.4075.0 on node 'riak@172.17.3.82' with exit value: {{badfun,#Funriak_search_client.9.56347389},[{mi_server,iterate,6},{mi_server,lookup,8}]} 2012-12-20 16:27:37.878 [error] 0.1940.0@mi_server:handle_info:524 lookup/range failure: {{badfun,#Funriak_search_client.9.56347389},[{mi_server,iterate,6},{mi_server,lookup,8}]} 2012-12-20 16:27:37.882 [error] emulator Error in process 0.4077.0 on node 'riak@172.17.3.82' with exit value: {{badfun,#Funriak_search_client.9.56347389},[{mi_server,iterate,6},{mi_server,lookup,8}]} This is a very specific error and is an indication that the lambda created in riak_search_client cannot be instantiated after being sent over the network. As I said, I have only ever seen this with mixed versions. We don't really have mixed riak release. But yes we do have a mixed erlang releases. Not sure if that makes any difference here. riak@172.17.3.82 Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false] riak@172.17.3.63 Erlang R15B (erts-5.9) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false] I bet this is it. In order for a lambda to be reconstructed after being sent over the wire very specific conditions need to be met. My guess is either the erlang version is checked explicitly or is part of the module hash, thus causing this failure. Checkout this post by Kresten Krab Thorup. http://www.javalimit.com/2010/05/passing-funs-to-other-erlang-nodes.html Unfortunately none of these error happens on our local dev environment. On my local dev box, I run a 5 node cluster (ofcourse all nodes on same physical machine). Yes, because they are all using the same erlang version. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search - searching across all fields
On Sat, Dec 15, 2012 at 12:43 AM, Matt Painter m...@deity.co.nz wrote: Thanks so much Ryan - yokozuna sounds most promising. If I were building a small system (relatively simple, small user base) that will be production-ready in a few months, do you think that Yokozuna could cut the mustard? I see that it's officially an experimental prototype, but do you think it's stable 'enough' in its current state? Sorry if this is an impossible question to answer with too many variables... My hope is to start delivering packages of Riak/Yokozuna by late February. They probably wouldn't be official Riak packages, but would allow for easier installation for those that don't want to use AMI/source. Compared to Riak Search, Yokozuna will do better in almost all cases but a few. In a few cases Riak Search has an upper hand in latency/throughput but I have tracked down the cause and will be making some patches to Solr's distributed search soon. Otherwise, Yokozuna is better in every way. Language support, analyzer support, features, performance, robustness, etc. Over the next couple of months I hope to start publishing benchmarks and other information. That said, this is still experimental, and I'm not sure I would recommend using Yokozuna in production just yet. But I would love to find users to prototype with to see how well Yokozuna can handle various use cases. If this sounds interesting to you please send me a direct email. I must confess that using a forked Riak makes me a touch queasy for anything other than playpen stuff. Do you think that a combo deal of Riak + elasticsearch could be a suitable compromise for the time-being? The fork of Riak used by Yokozuna is extremely minimal. It mostly consist of bundling the yokozuna library and sending the KV data so it can be indexed. The goal is and will continue to be to make _minimal_ changes outside of Yokozuna. You could certainly combine Riak and ES. Other users of ours have done it. Honestly, go with whatever works for you. No need to wait for Yokozuna if you think you can get it done today with other tools. I will note, however, that something like that will not be as tightly integrated as Yokozuna. Which isn't to say it's bad, it's just a trade off to be aware of. E.g. Yokozuna has built-in active anti-entropy (AAE) with Riak--when data becomes divergent AAE will detect it and fix it for you without requiring action on your part. You won't get that with Riak + external solution (without writing your own code to do it). -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak search query timeout issues with 1.2.1 stable
Hi, comments inline On Wed, Dec 5, 2012 at 8:10 AM, Abhinav Singh abhinavsi...@ymail.comwrote: We are facing an issue where search queries works fine on my local dev box (which have riak-1.2.1rc2 installed). However same queries timeout on our production boxes (which have riak-1.2.1 installed): 2012-12-05 14:49:59.777 [error] 0.1035.0@mi_server:handle_info:524 lookup/range failure: {{badfun,#Funriak_search_client.9.56347389},[{mi_server,iterate,6},{mi_server,lookup,8}]} Did you recently upgrade your production boxes? The 'badfun' error is an indication that you currently have a mixed cluster. The error will occur when two or more machines are involved and they are not all the same version. This is a bug in Riak Search. This query does succeed sometimes (1-5%), but fails most of the times. I want to know if the above logs indicate towards a particular error with our riak cluster? Yes, so in 1-5% of the cases the nodes involved in a query are all the same version. The reasons this is non-deterministic is because Riak Search uses some randomness during query time to help spread load around. Since this query has never failed on my local development box, I suspect either it has to do with something that changed between 1.2.1rc2 and 1.2.1-stable release or something that is related to our production riak cluster. As I said above. I strongly suspect a mixed cluster scenario. That's the only time I've seen an error like the above. The second email also strongly indicates a mixed cluster scenario given the behavior you are seeing. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search - searching across all fields
Matt, comments inline On Tue, Dec 11, 2012 at 3:35 AM, Matt Painter m...@deity.co.nz wrote: Apart from a single default value, is it possible for Riak Search to search for a keyword across all fields in a document without having to specify the field up front as a prefix in one's search term? A field must be specified to search, but a default field may be specified in the schema [1]. This field will be searched if one is not specified. But there is no way to do a search against all fields. It is always over one field. I'm guessing that one solution could be a post-commit hook which recursively iterates over all fields and squashes them into a secondary default value field - but since I know even less about Erlang and am just starting out with Riak, I thought it prudent to see if there was a more straightforward solution... Your use case immediately makes me think of Solr copy fields. You index everything under their individual fields but all values get copied into a catch-all field so that all content may be searched easily. However, with this you lose the ability to know which field it came from. Riak Search doesn't have copy field functionality. You'd have to concatenate all the data into a field on your application side. The new search solution I've been working on, Yokozuna, uses Solr underneath and therefore does support copy fields [2]. You could create a pre-commit hook to do this field-squashing but I think you would be better off doing it in your application. To do it via a hook you'd have to make sure it runs before the search hook (I can't remember if you can force specific order of pre-commit hooks). It would also have an effect on your write latencies as more pre-processing would have to be done. Finally, you would have to write Erlang. The use case is this: We are providing an object + metadata store for users to deposit files and any number of related fragments of structured JSON metadata. We are not enforcing any metadata schema - and therefore can't know up-front any field names - but would like the ability for a dumb keyword search from a website to return references to the records they have deposited in Riak. Essentially, providing a Google-like interface. (As a side question, Is Riak Search mature enough for these type of very generic searches? I know that it's inspired by Lucene and Lucene-like, but I don't know how many of Lucene's goodies are present - or is it just a case of invoking analysers provided by Lucene for things like stemming, and all will be pretty much equivalent for most situations?) There are no goodies present _at all_. Riak Search is an in-house implementation, completely written in Erlang. It's only connection to Lucene/Solr is a superficial interface that looks very much like Lucene/Solr. E.g. you mention stemming, there is no stemming support in Riak Search and would be a non-trivial addition. This is one of the big reasons Yokozuna is being written [2]. The world of search is vast and complicated, best to start with proven solution and build from that. Riak Search generally starts causing pain when you have searches that match tens of thousands of documents. The runtime is proportional to the size of the result set. In fact, Riak Search has a hard-coded upper limit to fail queries that match 100K or more documents (although it does the work to get 100K results and then drops it all on the floor so you still use resources/times). For example, if a lot of your files were pictures and were tagged with something like {type:picture} then a search for picture is probably going to cause issues. Things really start to hurt when you do conjunction queries with multiple large result sets, e.g. funny AND picture. Once again, this is not the case with Yokozuna, which in my benchmarking thus far has shown flat latencies regardless of result set size. -Z [1]: http://docs.basho.com/riak/latest/cookbooks/Riak-Search---Schema/#Defining-a-Schema [2]: https://github.com/rzezeski/yokozuna ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak over LAN vs WAN
Jon, On Mon, Nov 19, 2012 at 3:28 AM, Jon Perez jbperez...@yahoo.com wrote: Why exactly are these reasons? I suspect they have to do with performance. If so, how exactly does performance degrade when nodes are spread out between hosts with latencies in the tens to hundred plus milliseconds (as is typical over WANs)? Yes, latency is a big reason. Every Riak requests involves N vnodes. If those vnodes are spread across different regions with varying latencies then your deviation grows and higher percentiles go through the roof. For some this many be okay, but the wider you go the more unpredictable your latency profile becomes. Predictable latency is key to many applications built on top of Riak. Many of these apps are web-apps with tight constraints of the maximum time any request should take. Given that most webapps are made up of many components behind the scenes it is vital the each individual component deliver as predictably as possible so that the developers can have some confidence on the end-to-end latency of a request. This is a point made very clear in the Dynamo paper which heavily influenced the design of Riak. Or does it have more to do with reliability of connections and overhead in retrying them? I can understand that Bashio has a vested interest in promoting Riak Enterprise for this, but it would be nice if the technical details of why were actually laid out in detail. This is another reason. A node that is dead is indistinguishable from a node that is simply taking a really long time to respond. Once you spread nodes across a WAN the chance for network failure, and thus network partitions, becomes much greater. Riak is designed to always be available for writes but you still want to avoid partitions as much as possible. Partitions are one of the primary causes of siblings, potentially generating lots of sibling resolution. Partitions also cause additional load to be placed on the nodes. Say you had a 6-node cluster configured as 2 3-node clusters in different data centers. If the link between the data centers goes down or becomes too slow you'd end up with a partition between the 2 3-node clusters and each would have to take on the load that was being served by the 6-node cluster. This includes things like disk space, file descriptors, open ports, memory usage, CPU usage, network utilization, etc. From my limited experience with Riak, getting multiple nodes within a cluster going is extremely simple, whereas going multiple clusters is a very different story and requires a new layer of understanding. It's too bad that there is a distinction between nodes over WANs and LANs. I guess the holy grail of dbs is still some ways off, although Riak seems to be the closest fit right now. The way Riak is designed is far from the most efficient way to replicate data across a WAN. A lot of that code was/is written with assumptions of LAN and fairly predictable latency. This is one of the reasons we have a separate piece of software that performs this task. This problem is not as easy as some people think. You should checkout Andrew Thompson's RICON talk on Riak's WAN replication. http://vimeo.com/52016325 -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak over LAN vs WAN
On Mon, Nov 19, 2012 at 10:47 AM, Ryan Zezeski rzeze...@basho.com wrote: The way Riak is designed is far from the most efficient way to replicate data across a WAN. A lot of that code was/is written with assumptions of LAN and fairly predictable latency. This is one of the reasons we have a separate piece of software that performs this task. I think I gave the wrong impression by saying our WAN support is a separate piece of software. It is a separate set of code but is tightly integrated with our enterprise version of Riak. Just wanted to clarify. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search Field Aliases
Brian, First, our documentation is wrong. Sorry. The correct way to add aliases looks like so: {field, [ {name, Name}, {analyzer_factory, {erlang, text_analyzers, standard_analyzer_factory}}, {alias, LastName}, {alias,FirstName}, {alias,MiddleName} ]}, Second, it looks like you want the semantics of a Solr copyField. That is, index the first, middle, and last names individually but also copy their contents into the field `Name` so that a user can easily search against the entire name. Unfortunately Riak Search's alias mechanism doesn't provide this semantic even if the documentation might give that impression. An alias does 2 things: 1. If a field exist with the same name as the alias then index it under the containing field name. E.g. if the field 'LastName' exists then index it under 'Name'. Riak Search only indexes a field once. So either the alias 'FirstName' is found and indexed under 'Name' or 'FirstName' is found and indexed under 'FirstName'. This means that if you declare both an alias and a normal field with the same name the order in the schema will determine which one wins. Both will not be used. 2. If there are multiple alises for a given field then concatenate the values of every alias to form one field value. The order they are concatenated is the same order as they are declared in the object being indexed. Thus if you happened to index {M:Middle, F:Firrst, L:Last} then the 'Name' field would have value 'Middle First Last'. To achieve your goal you need to copy the field yourself. Declare the 'Name' field like the other fields. Don't use aliases. Index the object as {FirstName:First, MiddleName:Middle, LastName:Last, Name:First Middle Last}. -Z P.S. The new search solution I've been working on, Yokozuna, integrates Solr with Riak and thus supports copy fields. http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-November/010042.html On Sun, Oct 28, 2012 at 4:54 PM, Brian Hodgen brian.hod...@gmail.comwrote: Can somebody explain in more detail how the aliases parameter on the search schema definition works? The documentation says it lets me index multiple fields into one, so I tried to setup some schemas to let me search on Name, that is actually the combined data of FirstName, LastName, MiddleName. I've got the search working for the properties by themselves, but I can't seem to make the aliases work, so I'm either doing something wrong or I misunderstood how they are supposed to work. Schema Example: I would really like this to work...but querying on Name never returns any results. { schema, [ {version, 1.1}, {n_val, 3}, {default_field, Name}, {analyzer_factory, {erlang, text_analyzers, noop_analyzer_factory}} ], [ {field, [ {name, FirstName}, {analyzer_factory, {erlang, text_analyzers, standard_analyzer_factory}} ]}, {field, [ {name, MiddleName}, {analyzer_factory, {erlang, text_analyzers, standard_analyzer_factory}} ]}, {field, [ {name, LastName}, {analyzer_factory, {erlang, text_analyzers, standard_analyzer_factory}} ]}, {field, [ {name, Name}, {analyzer_factory, {erlang, text_analyzers, standard_analyzer_factory}}, {aliases, [LastName,FirstName,MiddleName]} ]}, {dynamic_field, [ {name, *}, {skip, true} ]} ] }. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Search not working any more
Martin, On Wed, Oct 31, 2012 at 6:01 PM, Martin Streicher martin.streic...@gmail.com wrote: I deleted my database today using rm -rf on the data directory. I stopped Riak before the delete, and restarted after recreating that directory with mkdir. Now, I cannot search. The data directory includes the ring file. The ring file is where custom bucket properties are stored like the search pre-commit hook. By deleting the ring those hooks are lost and need to be re-installed. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Search not working any more
On Sat, Nov 3, 2012 at 4:33 PM, Martin Streicher martin.streic...@gmail.com wrote: Is there a programmatic way to achieve the equivalent of search-cmd install zids, so that whenever my application launches, it can enable those settings? Yes. At startup you could add the 'search' bucket property with a value of 'true'. curl -XPUT -H 'content-type: application/json' ' http://localhost:8098/riak/foo' -d '{props:{search:true}}' That will cause the pre-commit hook to be added. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Deleting items from search index increases disk usage
On Fri, Nov 2, 2012 at 9:52 AM, Jeremy Raymond jeraym...@gmail.com wrote: Some files changed and some didn't. Not really sure how to interpret the differences. Another thing, compacting will occur only if there are 6 or more active segments. So once you get down to 5 segments or less compaction becomes a noop. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Deleting items from search index increases disk usage
Active is any segment file that has the suffix .data. [Sent from my iPhone] On Nov 2, 2012, at 11:11 AM, Jeremy Raymond jeraym...@gmail.com wrote: When do segments become active/inactive? -- Jeremy On Fri, Nov 2, 2012 at 10:50 AM, Ryan Zezeski rzeze...@basho.com wrote: On Fri, Nov 2, 2012 at 9:52 AM, Jeremy Raymond jeraym...@gmail.com wrote: Some files changed and some didn't. Not really sure how to interpret the differences. Another thing, compacting will occur only if there are 6 or more active segments. So once you get down to 5 segments or less compaction becomes a noop. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Deleting items from search index increases disk usage
Jeremy, On Fri, Nov 2, 2012 at 12:31 PM, Jeremy Raymond jeraym...@gmail.com wrote: I cycled through the compaction on another node. Again after 3 rounds compaction has stopped. On one node the merge index is 26 GB on the other 21 GB. So it looks like I've hit the 5 segment compaction no-op condition on both nodes. I concur. This condition seems arbitrary to me and I'm not sure if there is a good reason for it to exist. But it's there and the only way we could remove it for you is to hot-load a new beam. What would account for the difference in merge_index size? Shouldn't these be relatively the same? There must still be tombstones in there... Riak Search uses term-based partitioning. It could be that you have some terms that are more frequent than others which would account for some of the difference. On my production cluster the merge_index is ~44GB. I estimate that approximately 90 - 95% of the index data belongs to the bucket I no longer want indexed. Manually deleting items from the index then manually triggering compaction doesn't look like it will scale. Will this workflow work to re-build the search index. I need to keep the cluster available for writes while doing this: 1. In a rolling fashion, disable Riak Search one node at a time. 2. Delete the contents of the merge_index on each node. 3. In a rolling fashion, re-enable Riak Search on each node. 4. Reindex the items to be included in the search index. No, instead of disabling Riak Search you'll want to take the nodes down one at a time, remove the merge index data, restart. After doing this for all nodes then re-index your data. This should do the trick right? Do I need to disable search before clearing out the merge_index folders or would disabling the search index on the buckets via search-cmd be enough (and then re-enabling) before re-indexing? Again, don't bother disabling search. The key is to take the nodes down because merge index caches stuff in memory. Actually, I thought of another way to achieve the same result without taking the nodes down. If you have a non-production cluster to test this on that would be a good precaution. I'm 99% sure this should work without issue. 1. Make sure no indexes are incoming, do this either at your client or uninstall all search hooks For each node: 2. Get a list of the MI Pids like in the manual compaction example 3. For each MI Pid call merge_index:drop(MIPid) 3a. Verify the data files were removed on disk After performing steps 2 3 on each node: 4. Re-write the objects you want indexed (of course remember to re-install the hooks if you removed them in step 1) -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
ANN: Yokozuna 0.1.0
Riak Users, I'm happy to announce the first alpha release of Yokozuna. Yokozuna is a new take on providing search support for Riak. It tightly integrates Solr 4.0.0 and the Riak master branch. I'm very excited about Yokozuna. It brings the power of Solr search to Riak. This means language support, analyzer support, and advanced querying support including boolean, ranked, facet, and spatial. You can even query Yokozuna with existing Solr clients*! Basically, if Solr supports it Yokozuna probably does too. On the other side of things Riak uses its great distributed bits to replicate and scale out Solr. Riak provides support for anti-entropy, handoff, distributed queries, and data replication. Together these two technologies complement each other well. Learn more about the 0.1.0 release: https://github.com/rzezeski/yokozuna/blob/master/docs/RELEASE_NOTES.md Getting started with Yokozuna: https://github.com/rzezeski/yokozuna#getting-started If you would rather not build from source an EC2 AMI is provided that may be used: https://github.com/rzezeski/yokozuna/blob/master/docs/EC2.md If you would like to see some very high-level diagrams of Yokozuna's architecture checkout my slides from RICON: https://speakerdeck.com/basho/yokozuna-ricon I'm looking for people to work with me directly to prototype solutions using Yokozuna. If that sounds interesting please email me directly. If you have any questions don't hesitate to ask them on riak-users or email me directly. -Z * - Please note I've only tested this against SolrJ. See https://github.com/rzezeski/yokozuna/blob/7abbc3f7430373a58fdefaa65731759344e86cc7/priv/java/com/basho/yokozuna/query/SimpleQueryExample.java ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Deleting items from search index increases disk usage
Jeremy, I was looking at the merge index code and I think the issue is that the method by which segments are chosen for compaction may be very slow to get to the larger segments. 1. Merge Index only schedules merging when a buffer is rolled-over to a segment. This means there will _always_ be at least one small segment in the list of potential segments to merge. 2. To determine which segments to merge the mean of all segment sizes is taken. Over time the mean will skew left of the bulk of the distribution. This means most compactions will touch only recent, smaller segments and it will take many iterations before one of the larger ones is included. To help verify this you could list all you segment sizes again and compare them with the last run. My guess is you'll have about the same number of segments but the smallest one will have grown a bit. It depends how much unique data you re-indexed. Depending on the distribution of your segment sizes I think it might be possible to reclaim some of this space via repeated compaction calls. It turns out there is a way to manually invoke compaction. It's just not easy to get too. Try running the following gist on one of your nodes https://gist.github.com/3996286. Try running merge_index:compact over and over again and each time check for changes in the segment file sizes. -Z On Thu, Nov 1, 2012 at 11:25 AM, Jeremy Raymond jeraym...@gmail.com wrote: I reindexed a bunch of items that are still in the search index but no disk space was reclaimed. Is there any Riak console Erlang voodoo I can do to convince Riak Search that now would be a good time to compact the merge_index? -- Jeremy On Tue, Oct 30, 2012 at 4:26 PM, Jeremy Raymond jeraym...@gmail.com wrote: I've posted the list of buffer files [1] and segment files [2]. The current data set I have in Riak is static, so no new items are being written. So this looks like the reason as to why compaction isn't happening since there is no time based trigger on the merge index. To get compaction to kick in, I should be able to to just reindex (by reading and rewriting) some of the existing items in buckets that are still indexed? Earlier today I upgraded to Riak 1.2 and ran a Search read repair [3] in an attempt to kick of compaction. Compaction didn't kick in, but instead disk consumption increased again. Should Search Repair trigger compaction or only writing objects to the KV store? [1]:https://gist.github.com/3982718 [2]:https://gist.github.com/3982730 [3]: http://docs.basho.com/riak/latest/cookbooks/Repairing-Search-Indexes/#Running-a Repair -- Jeremy On Tue, Oct 30, 2012 at 3:47 PM, Ryan Zezeski rzeze...@basho.com wrote: find /var/lib/riak/merge_index -name 'buffer.*' | xargs ls -lah find /var/lib/riak/merge_index -name 'segment.*' | xargs ls -lah ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Deleting items from search index increases disk usage
Jeremy, This is how Merge Index (the index store behind Riak Search) works. It is log-based meaning deletes are first logical before they become physical. It does not update in-place as you stated in one of your replies. When you performed those deletes new logs were created containing logical deletes (tombstones) causing more disk to be used. Assuming other buckets are still being indexed then compaction should be occurring and tombstones should be reaped. Meaning both the logical delete and the datum should be removed from disk. If no new indexes are arriving then nothing will be compacted as there is no time-based trigger on Merge Index. Merge Index could be doing a bad job of picking which segments to merge, leaving a high % of tombstones on disk longer than necessary. I'm curious, what is the output from the following commands. find /var/lib/riak/merge_index -name 'buffer.*' | xargs ls -lah find /var/lib/riak/merge_index -name 'segment.*' | xargs ls -lah On Mon, Oct 29, 2012 at 8:19 AM, Jeremy Raymond jeraym...@gmail.com wrote: So the only way to actually free the disk space consumed by the tombstones in the search index is to bring down the cluster and blow away the merge index (at /var/lib/riak/merge_index)? If, and only if, you are no longer indexing _any_ buckets then this would be the thing to do. If you are still indexing some buckets then deleting these files would break their indexes. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: How to search objects using riak_search method
On Tue, Oct 9, 2012 at 7:49 AM, 郎咸武 langxian...@gmail.com wrote: ((ejabberd@meta)19 f(O),O=riakc_obj:new(user1, jason3, list_to_binary([{\name\:\\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2\},{\sex\:\male1\}]), application/json). {riakc_obj,user1,jason3,undefined,[], {dict,1,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...}, {{[],[],[],[],[],[],[],[],[],[],[[...|...]],[],[],...}}}, [{\name\:\\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2\},{\sex\:\male1\}]} (ejabberd@meta)20 riakc_pb_socket:put(Pid, O). ok (ejabberd@meta)28 riakc_pb_socket:search(Pid, user1, list_to_binary(\sex\:male1*)). * The operation is ok.* {ok,{search_results,[{user1, [{id,jason3}, {name, 195,169,194,131,194,142,195,165,194,147,194,178}, {sex,male1}]}], 0.0,1}} Notice the value of the name field in the result here. It has been properly converted to a UTF-8 sequence. That is, at some point Riak Search took your ASCII string of unicode escapes and converted it to a proper unicode byte sequence. (ejabberd@meta)29 riakc_pb_socket:search(Pid, user1, list_to_binary(\name\:\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2)). {ok,{search_results,[],0.0,0}} *%% But there is empty. Why?* First off, you are adding additional quotes around the name field. 11 list_to_binary(\name\:\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2). \name\:\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2 Second, you are searching for the ASCII string \\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2. At no point is this string converted to a unicode sequence for you. This is the correct behavior because you might have ASCII documents containing unicode escapes. You need to query using a proper unicode binary. 19 riakc_pb_socket:search(Pid, user1, name:,195,169,194,131,194,142,195,165,194,147,194,178). {ok,{search_results,[{user1, [{id,jason3}, {name, 195,169,194,131,194,142,195,165,194,147,194,178}, {sex,male1}]}], 0.35355299711227417,1}} (ejabberd@meta)30 riakc_pb_socket:search(Pid, user1, list_to_binary(\name\:\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2*)). {ok,{search_results,[],0.0,0}} *%% There is empty,too. Why?* 20 riakc_pb_socket:search(Pid, user1, name:,195,169,*). {ok,{search_results,[{user1, [{id,jason3}, {name, 195,169,194,131,194,142,195,165,194,147,194,178}, {sex,male1}]}], 0.0,1}} -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: This is about riak search question. How to search utf8 format dat?
On Wed, Oct 10, 2012 at 12:52 AM, 郎咸武 langxian...@gmail.com wrote: *2)To put a Object to user1 bucket. The data is utf8 format.* (trends@jason-lxw)123 f(O), O=riakc_obj:new(user1, jason5,list_to_binary(mochijson:encode({struct, [{name, binary_to_list(unicode:characters_to_binary(爱))},{sex,male}]})), application/json). {riakc_obj,user1,jason5,undefined,[], {dict,1,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...}, {{[],[],[],[],[],[],[],[],[],[],[[...|...]],[],[],...}}}, {\name\:\\\u00e7\\u0088\\u00b1\,\sex\:\male\}} (((trends@jason-lxw)124 riakc_pb_socket:put(Pid, O). ok First, let's start with your data and make sure it's getting stored properly. 3 UC = unicode:characters_to_binary(爱). 231,136,177 Okay, so Erlang properly decoded this into a 3-byte unicode sequence. What does mochijson2 think? (I noticed you are using mochison, I recommend using mochijson2). 4 mochijson2:encode({struct, [{name, UC}]}). [123,[34,name,34],58,[34,\\u7231,34],125] Good, mochijson2 properly interpreted this as u7231. A quick lookup on the web verifies this is correct: http://www.fileformat.info/info/unicode/char/7231/index.htm. But notice in your code you call binary_to_list on the binary before passing it to mochi. Lets see what happened. 15 binary_to_list(UC). [231,136,177] Okay, so the integers are correct. But Erlang treats lists differently from binaries. It's just a list of integers to Erlang. 16 io:format(~ts~n,[binary_to_list(UC)]). ç± ok This is why mochi converted it to 3 chatacters: \\u00e7\\u0088\\u00b1 To make a proper unicode list the unicode:caracters_to_list function must be used. 17 UCS = unicode:characters_to_list(爱). [29233] 18 io:format(~ts~n, [UCS]). 爱 ok Let's try encoding again, but this time leave out the list_to_binary. 19 riakc_obj:new(user1, jason5, mochijson2:encode({struct, [{name, unicode:characters_to_binary(爱)}]}), application/json). {riakc_obj,user1,jason5,undefined,[], {dict,1,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...}, {{[],[],[],[],[],[],[],[],[],[],[[...|...]],[],[],...}}}, [123,[34,name,34],58,[34,\\u7231,34],125]} And there we go. A properly encoded unicode character. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: riak search - creating many indexes for one inserted object
Pawel, On Tue, Oct 9, 2012 at 5:21 PM, kamiseq kami...@gmail.com wrote: hi all, right now we are using solr as search index and we are inserting data manually. so there is nothing to stop us from creating many indexes (sort of views) on same entity, aggregate data and so on. can something like that be achieved with riak search?? Just to be sure I understand you. When you say many indexes do you mean something like writing to multiple Solr cores? If so, no, Riak Search cannot do that. It writes to an index named after the bucket you have the hook on. I think that commit hooks are good point to start with but as I read search index is kept in different format than bucket data and I would love to still use solr-like api to search the index. Yes, Riak Search stores index data in a backend called Merged Index. Riak Search has a Solr _like_ interface but it lacks many features, and doesn't have the same semantics or performance characteristics. There is a new project underway called Yokozuna which tightly integrates Riak and Solr. If you like Solr then keep an eye on this. I'm looking for people who want to prototype on it so if that interests you please email me directly. https://github.com/rzezeski/yokozuna example I have two entities cars and parking_lots, each car references parking lot it belongs to. when I create/update/delete car object I would like to not only update car index (so I can search by car type, name, number plates, etc) but also update parking index to easily check how many cars given lot has (plus search lots by cars, or search cars with given property). Why have a separate index at all? Is it not good enough to have just the car index. Each doc would have a 'parking_lot_s' field. How many cars a given lot has -- would be numFound on q=parking_lot_s:foo. Search lots by cars -- I'm guessing you mean something like tell me what lots have cars like this, sounds like a facet on 'parking_lot_s', right? Search cars with a given property -- like the last query but no facet. probably all this can be achieved in many other ways. I can imagine storing array of direct references in parking object and update this object when car object also changed. but this way I need to issue two asynchronous write request with no guaranties that both will be persisted. Yes. This is a problem with two Solr cores as well. I'm not sure if this is a toy example but I don't see the need for 2 indexes. I potentially see 2 buckets: 'cars' and 'lots'. But that doesn't mean it has to be two indexes. Does that make sense? -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Riak Search
On Sun, Oct 14, 2012 at 12:33 AM, Pavel Kogan pavel.ko...@cortica.comwrote: 1) Is search enabling has any impact on read latency/throughput? If you are reading and searching at the same time there is a good chance it will. It will cause more disk seeks. 2) Is search enabling has any impact on RAM usage? Yes, the index engine behind Riak Search makes heavy usage of Erlang ETS tables. Each partition has an in-memory buffer as well as an in-memory offset table for every segment. It also uses a temporary ETS table for every write to store posting data. The ETS system limit can even become an issue in overload scenarios. 3) In production we have no search enabled. What is the best way to enable search without stop production? I thought about something like: 1) Enable search node after node. You could change the app env dynamically but that's only half the problem. The other half is then starting the Riak Search application. I think application:start(merge_index) followed by application:start(riak_search) should work but I'm not 100% sure and this has not been tested. You'll also want to make sure to edit all app.configs so that it is persistent. 2) Execute some night script that runs on all keys and overwrite them back with proper mime type. Yes, you'll want to install the commit hook on the buckets you wish to index. Then you'll want to do a streaming list-keys or bucket map-reduce and re-write the data. 4) If we see that search overhead is something we can't handle, is there simple way to disable it without stop production? I think the best course of action in this case would be to disable the commit hook. But you would have to keep track of anything written during this time and re-write it after re-installing the hook. If you don't then you'll have to re-index everything because you don't know what you missed. 5) In what case we would need repair? It is said - on replica loss, but if I understand correct we have 3 replicas on different nodes don't we? If it happens how difficult and long would it be for large cluster (about 100 nodes)? Repair is on a per partition basis. Number of nodes doesn't come into play. Repair is very specific in that it requires the adjacent partitions to be in a good, convergent state. If they aren't then repair isn't much help. A lot of these entropy issues go away in Yokozuna. Repairing indexes is done automatically, in the background, in an efficient manner. There is no need to re-write data or run manual repair commands. -Z ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com