Re: [riak-user]Cannot startup riak node correctly after successful installation

2015-02-05 Thread Ryan Zezeski

YouBarco writes:

 Hello,

 My OS is ubuntu 14.04 64bit, and installed erlang from source with version 
 R16B as following:

That's your problem. You MUST use the custom Basho branch of Erlang/OTP
with Riak. If you insist on building Erlang/Riak from source then follow
this guide for Erlang:

http://docs.basho.com/riak/latest/ops/building/installing/erlang/

 bad scheduling option -sfwi

This flag was added by Basho to the 16B series. IIRC, vanilla 16B02
includes this flag but you still should use Basho's custom branch since
it has other fixes required by Riak.

-Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: risk attach not working properly after operation

2014-12-17 Thread Ryan Zezeski

Oliver Soell writes:

 That was all well and good, but now “riak attach” isn’t giving me the love I 
 thought I should get:

 (c_1494_riak@172.29.18.183)1 {ok, Ring} = 
 riak_core_ring_manager:get_my_ring().
 ** exception error: no match of right hand side value {error,no_ring}
 (c_1494_riak@172.29.18.183)2


This is because attach is a remote shell (a change introduced in 2.0.0 I
think). Either you need to use an rpc call or you can used riak
`attach-direct`. If you do the later remember to use Ctrl-D to exit, not
Ctrl-C.

-Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna - inconsistent number of documents found for the same query

2014-12-01 Thread Ryan Zezeski

Eric Redmond writes:

 This is a known issue, and we're still working on a fix.

 https://github.com/basho/yokozuna/issues/426


I don't see how this issue is related to Oleksiy's problem. There is no
mention of removing or adding nodes. I think the key part of Oleksiy's
report is the association of an index _after_ data had already been
written. That data is sometimes missing. These two issues could be
related but I don't see anything in that GitHub report to indicate why.


 On Nov 29, 2014, at 9:26 AM, Oleksiy Krivoshey oleks...@gmail.com wrote:
 
 1. Create a bucket, insert some keys (10 keys -  KeysA)
 2. Create Yokozuna Index, associate it with the bucket
 3. Add or update some new keys in the bucket (10 keys - KeysB)
 4. Wait for Search AAE to build and exchange the trees
 
 Now when I issue a search query I will always get all 10 KeysB but a random 
 amount of KeysA, for example the same query repeated 5 times may return:
 
 10 KeysB + 2 KeysA
 10 KeysB + 0 KeysA
 10 KeysB + 7 KeysA
 10 KeysB + 1 KeysA
 10 KeysB + 10 KeysA
 

Are there any errors in the logs? Does the count go up if you wait
longer? What does `riak-admin search aae-status` show?

-Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna error during indexing

2014-11-21 Thread Ryan Zezeski

Oleksiy Krivoshey writes:

 Hi,

 I have enabled Yokozuna on existing Riak 2.0 buckets and while it is still
 indexing everything I've already received about 50 errors like this:

 emulator Error in process 0.26807.79 on node 'riak@10.0.1.1' with exit
 value:
 {{badmatch,false},[{base64,decode_binary,2,[{file,base64.erl},{line,211}]},{yz_solr,to_pair,1,[{file,src/yz_solr.erl},{line,414}]},{yz_solr,'-get_pairs/1-lc$^0/1-0-',1,[{file,src/yz_solr.erl},{line,411}]},{yz_solr,'-get_pairs/1-lc$^0/1-0-'...

 Can someone please describe what does it mean?

I'm fairly certain the base64 library in Erlang is indicating that you
have a truncated base64 string.

https://github.com/basho/otp/blob/OTP_R16B02_basho6/lib/stdlib/src/base64.erl#L211

You should be able to attach to the Riak console and run the following
command to get the base64 string:

redbug:start(yz_solr:to_pair - return).

That will give you the Type/Bucket/Key and the base64 string that is
causing the issue. Knowing that info can help you confirm the issue and
perhaps figure out what it is happening.

Are you using a custom schema?

-Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna error during indexing

2014-11-21 Thread Ryan Zezeski

Oleksiy Krivoshey writes:

 Yes, I'm using custom schema and custom bucket type. There are many (over
 500) buckets of this type.

Did you modify any of the _yz_* fields?

 Your command have returned the following tuple:

 (riak@10.0.1.1)1 redbug:start(yz_solr:to_pair - return).
 {1919,1}
 redbug done, timeout - 0

 How do I get bucket/key/base64_string from this?

Yea, this needs to be running when you happen to come across a bad
value.  Try a higher timeout and hope you get lucky.

redbug:start(yz_solr:to_pair - return, [{time, 60}]).

That should let it run for 10 minutes.

-Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna error during indexing

2014-11-21 Thread Ryan Zezeski

Oleksiy Krivoshey writes:


 Still, what kind of base64 string can it be? I don't have anything base64
 encoded in my data, its a pure JSON objects stored with content_type:
 'application/json'

The _yz_* fields (which need to be part of your schema and defined
exactly as defined in the default schema) are generated as part of
indexing. The entropy data field (_yz_ed) uses a base64 of the object
hash so that hashtrees may be rebuilt for the purpose of Active
Anti-Entropy (AAE). My guess is somehow this value if getting truncated
or corrupted along the way.

https://github.com/basho/yokozuna/blob/develop/priv/default_schema.xml#L111

This code is only executed when rebuilding AAE trees. What is the output
from the following?

riak-admin search aae-status

-Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna error during indexing

2014-11-21 Thread Ryan Zezeski

Oleksiy Krivoshey writes:


  Entropy Trees
 
 Index  Built (ago)
 ---
 11417981541647679048466287755595961091061972992--
 57089907708238395242331438777979805455309864960--
 102761833874829111436196589800363649819557756928   --
 148433760041419827630061740822747494183805648896   --
 194105686208010543823926891845131338548053540864   10.4 hr
 239777612374601260017792042867515182912301432832   --
 285449538541191976211657193889899027276549324800   --
 650824947873917705762578402068969782190532460544   --
 696496874040508421956443553091353626554780352512   --
 742168800207099138150308704113737470919028244480   --
 787840726373689854344173855136121315283276136448   --
 833512652540280570538039006158505159647524028416   --
 879184578706871286731904157180889004011771920384   --
 924856504873462002925769308203272848376019812352   --
 970528431040052719119634459225656692740267704320   --
 1016200357206643435313499610248040537104515596288  --
 1061872283373234151507364761270424381468763488256  9.4 hr
 1107544209539824867701229912292808225833011380224  --
 1153216135706415583895095063315192070197259272192  --
 119061873006300088960214337575914561507164160  12.1 hr
 1244559988039597016282825365359959758925755056128  --
 1290231914206187732476690516382343603290002948096  --
 1335903840372778448670555667404727447654250840064  11.4 hr
 1381575766539369164864420818427111292018498732032  --
 1427247692705959881058285969449495136382746624000  --


So it seems many of these trees are not building because of this issue.
The system will keep trying to build but it will fail every time because
of the bad base64 string. Trying to catch this with redbug will prove
difficult too because it automatically shuts itself off after X events.
That can be changed but then you have to dig through a mountain of
output. Not a fun way to do things.

How comfortable are you with Erlang/Riak? Enough to write a bit of code
and hot-load it into your cluster?

-Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna error during indexing

2014-11-21 Thread Ryan Zezeski

Oleksiy Krivoshey writes:

 Got few results:

 I don't see anything wrong with the first record, but the second record
 mentions the key '/.Trash/MT03' which is not correct, the correct key that
 exists in that bucket is

 '/.Trash/MT03 348 plat frames'


You have found a bug in Yokozuna.

https://github.com/basho/yokozuna/blob/develop/src/yz_doc.erl#L230

https://github.com/basho/yokozuna/blob/develop/java_src/com/basho/yokozuna/handler/EntropyData.java#L139

It foolishly assumes there is no space character used in the type,
bucket, or key names. As a workaround I think you'll have to make sure
your application converts all spaces to some other character (like
underscore) before storing in Riak.

-Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Solution for Riak 500 Internal Server Error

2014-10-14 Thread Ryan Zezeski

On Oct 14, 2014, at 3:53 AM, ayush mishra ayushmishra2...@gmail.com wrote:

 http://www.dzone.com/links/r/solution_for_riak_500_internal_server_error.html
 ___
 

I recommend _not_ using legacy Riak Search on Riak 2.x.  Why was the legacy 
search pre-commit hook installed in the first place?  Are you trying to use 
search?

Documentation for new search:

http://docs.basho.com/riak/latest/dev/using/search/

-Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna Scale

2014-09-18 Thread Ryan Zezeski

On Sep 18, 2014, at 4:20 AM, anandm an...@zerebral.co.in wrote:

 Yes - looks like its going that way too - a decoupling between the Solr Cloud
 and the Riak Cluster seems like a solution that could work out and then
 Yokozuna to index content out to the Solr Cloud (completely external to riak
 - Solr not made to baby sit in Riak) - In this arrangement we could maintain
 index distribution with Solr in a Sharded env (over implicit or composite id
 collections) and riak used just as a kv. Yokozuna could also be used as a
 front to Solr Cloud - and it will search on Solr and fetch the matching docs
 from Riak and returned the merged doc back to the client.

Hi, creator of Yokozuna here.  I just want to make it clear that
Yokozuna does not use SolrCloud.  It uses regular old Solr and
Riak does the sharding  replication.  Yokozuna uses Solr’s
Distributed Search which is _not_ SolrCloud.  It uses Riak Core
coverage to build the query plan and feeds that into Solr’s
Distributed Search.


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna Scale

2014-09-18 Thread Ryan Zezeski

On Sep 18, 2014, at 10:35 AM, anandm an...@zerebral.co.in wrote:

 Yes Ryan - that aspect is pretty clear. So with 1-1 riak-solr Yokozuna
 deployment scale to my requirement? Am I missing something here when
 thinking it wouldn't?

I haven’t followed this thread closely, it was just your last
email that caught my eye.  The only way you'll know if it scales
is if you try.  The one thing I might worry about is the heap
usage.  I'm not sure if Yokozuna will allow it but you might try
tweaking the the schema so that the `_yz_*` fields use on-disk
DocValues.  IIRC, this was a change I was thinking of making to
reduce the heap pressure (at the potential cost of extra I/O?
Honestly it's been months since I've thought hard about this
stuff.

There is one person that I know of who has pushed Yokozuna a fair
bit and that is Wes Brown.  Perhaps you can track him down and
get some hard-won answers:

http://basho.com/rubicon-io-uses-riak-to-provide-real-time-threat-analysis/
https://github.com/basho/yokozuna/issues?q=is%3Aissue+author%3Awbrown

-Z___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Optimistic Locking with Riak Search / is _version_ required in schema.xml?

2014-08-07 Thread Ryan Zezeski

On Aug 7, 2014, at 5:46 PM, David James davidcja...@gmail.com wrote:

 Is _version_ required?

It should not be required as the documentation says it is only needed for 
real-time GET which Riak Search (Yokozuna) disables since Riak KV provides the 
get/put implementation.
 I see SolrCloud mentioned in some documentation (see below)? Does Riak Search 
 use it?

RS does not make use of SolrCloud at all.  It uses Solr’s Distributed Search 
but that is something that existed well before SolrCloud.  All routing and 
replica administration is handled by Riak.  Each Solr instance (one per node) 
has no awareness of the other nodes except for the explicit distributed queries 
sent by Riak.

 How does Riak Search handle optimistic locking?

It doesn’t use Solr’s optimistic locking at all.  All key-value semantics come 
from Riak itself.  RS simply indexes an object’s values.
 See this comment on the default_schema.xml on Github:
 !-- TODO: is this needed? --
 field name=_version_ type=long indexed=true stored=true/
 https://raw.githubusercontent.com/basho/yokozuna/develop/priv/default_schema.xml

Yes, I wrote that TODO.  It is one of many that founds its way into 2.0.0 :).  
You should run fine without this field if you create a custom schema.

 
 
 P.S. Per https://wiki.apache.org/solr/SchemaXml
 _version_  Solr4.0 - This field is used for optimistic locking in SolrCloud 
 and it enables Real Time Get. If you remove it you must also remove the 
 transaction logging from solrconfig.xml, see Real Time Get.

Just to reiterate what I said above, RS disables the transaction logging and 
thus there is no real time get.  There is no reason for it since that is what 
Riak itself provides.

-Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search 2.0 Questions

2014-07-24 Thread Ryan Zezeski

On Jul 24, 2014, at 12:29 PM, Andrew Zeneski and...@andrewzeneski.com wrote:

 Been checking out 2.0rc1 and am really excited about the new features (as I 
 think most of us are). I had a couple of questions that I couldn't find 
 answers to scanning the docs. Totally possible I missed it and if so, please 
 feel free to direct me to the proper place.
 
 1. Is there a way to remove a search index and schema? 

No, currently you can only store/update a schema.

 
 2. Do indexes just reference schemas?

An index has an associated schema.  When the index is created locally on a node 
it retrieves that schema from an internal store built into Riak and writes it 
to the directory specific to that index.  The index uses the schema stored in 
its local directory.  Updates to the schema are not automatically propagated to 
the local file.

 More specifically, if I update a schema will those changes propagate to all 
 indexes using that schema? 

No, you will need to either need to delete the index and recreate or attach to 
the Riak console and run the following command:

rp(yz_index:reload(index_name)).

This command will fetch the latest version of the associated schema for the 
index “index_name”, overwrite the index’s local schema, and then reload that 
index across the entire cluster.

Why is this all so awkward?  Some of the gory details can be found in these two 
issues if you really want to know:

https://github.com/basho/yokozuna/issues/130
https://github.com/basho/yokozuna/issues/403

 
 The reason I ask is I've been experimenting with simple searching and found 
 in the logs an error indexing a document due to unknown fields. I realized I 
 missed the catch all dynamic field in my schema and updated it. After 
 updating I ran the test again (after deleting any existing data) but the 
 error persists. Leading me to believe that the schema isn't updating. But 
 when I view the schema $RIAK_HOST/search/schema/testschema I see the updates.

Yes, the schema itself has been updated but, as explained above, it is once 
removed from the index and not automatically reloaded.

-Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna search

2014-07-23 Thread Ryan Zezeski

On Jul 23, 2014, at 10:02 AM, Sean Cribbs s...@basho.com wrote:

 . In this case, no, you cannot use wildcards at the beginning [1]. 
 [1] http://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Wildcard 
 Searches
 

Actually, you can place the wildcards * or ? anywhere, it doesn’t matter.  When 
placing it at the start it just means the entire term index will have to be 
searched to determine if the term exists.  A common trick veteran Lucene/Solr 
users will use is to index all terms both forward and backwards, that way you 
can turn a postfix query (e.g. *ly) into a prefix query (e.g. yl*) [1].


 
 On Wed, Jul 23, 2014 at 4:22 AM, Alexander Popov mogada...@gmail.com wrote:
 Will queries support masks at beging and 1 char mask like  *lala and a*
 
 

Yes, it absolutely will.  As Sean said, Yokozuna uses Solr and therefore gives 
all the same functionality so long as that query type is supported by Solr’s 
distributed search (and the most important stuff is [2]).  Yokozuna uses Solr 
4.7.0, the Solr Reference Guide is a great place to learn more about Solr [3].

-Z

[1]: 
http://stackoverflow.com/questions/8515190/solr-reverse-wildcard-field-association

[2]: 
https://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

[3]: 
https://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.7.pdf___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Endless AAE keys repairing

2014-07-17 Thread Ryan Zezeski

On Jul 17, 2014, at 4:30 AM, Daniil Churikov ddo...@gmail.com wrote:

 Hello, In our test env we have 3 nodes riak 1.4.8-1 cluster on debians. 
 According to logs: 2014-07-17 02:48:03.748 [info] 
 0.10542.85@riak_kv_exchange_fsm:key_exchange:206 Repaired 1 keys during 
 active anti-entropy exchange of 
 {936274486415109681974235595958868809467081785344,3} between 
 {936274486415109681974235595958868809467081785344,'riak@10.3.13.96'} and 
 {981946412581700398168100746981252653831329677312,'riak@10.3.13.96'} Messages 
 like this constantly appears, there is not so much load on this test cluster 
 and I expected that eventually everything will be fixed, but this messages 
 keep coming from day to day. In the past we had several issues with one of 
 the cluster participants and as a result we did enabled AAE to fix it. What 
 could be possble the reason of this? 

This is probably caused by regular puts.  When AAE performs an exchange it 
takes snapshots of each tree in a concurrent manner.  This means that a 
snapshot could occur while replicas for a given object are still in flight.  
For example:

1. User writes object O.
2. Coordinator sends O to 3 partitions A, B, and C.
3. Partition A accepts O and updates hash tree.
4. Entropy manager on node which own partition A decides to perform an exchange 
between A and B.
5. Snapshot is taken of hash tree for A.
6. Snapshot is taken of hash tree for B.
7. Partition B accepts O and updates hash tree (but the update is not reflected 
in the snapshot just taken)
8. Partition C accepts O and updates hash tree.
9. Exchange between A  B determines object is missing on B and performs a read 
repair.
10. Read repair notices that object O exists on all three partitions and there 
is nothing to be done.

The higher the load the more keys that could be included in one snapshot but 
not the other.  I would say that any time your cluster is accepting writes it 
might be normal to see a handful of keys getting “repaired”.  But if you see, 
say, more than 10 (especially if there are 0 outstanding writes) then that is 
probably a sign of real repair.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: AAE problems

2014-06-19 Thread Ryan Zezeski
On Tue, Jun 17, 2014 at 12:46 PM, István lecc...@gmail.com wrote:


 The entire dataset is idempotent and immutable, so there is not even a
 slightest chance that we are ending up with different values on
 different nodes for the same key in the same bucket. It seems that
 anti-entropy still finds problems:

 /var/log/riak/console.log.4:2014-06-11 06:11:41.756 [info]
 0.6776.6003@riak_kv_exchange_fsm:key_exchange:206 Repaired 1 keys
 during active anti-entropy exchange of
 {536645132457440915277915524513010171279912730624,3} between
 {548063113999088594326381812268606132370974703616,'riak@10.1.11.120'}
 and {559481095540736273374848100024202093462036676608,'riak@10.1.11.121'}


AAE exchange uses snapshots of the trees.  The snapshots on each node will
happen concurrently.  If your cluster is servicing writes as these
snapshots are made then there is a chance a snapshot will be made on one
node containing keys X,Y,Z and on the other node which has only seen keys X
 Y.



 My question would be:

 Is there any reason to let AAE running if we don't mutate the data in
 place?


YES.

Immutable data provides nice semantics for your application but does
_nothing_ to save you from the whims of the stack your application runs on.
 Operating systems, file systems, and hardware all have subtle ways to
corrupt your data both on disk and in memory.  Immutable data also doesn't
help in the more practical case where the network decides to drop packets
and a write only makes it to some of the nodes.



 Is there any way knowing what is causing the difference according to
 AAE between two nodes?


There is but it requires attaching to Riak and running some diagnostic
commands _when_ a repair takes place.  I'm not sure it will give you any
insight though.  It will either say: 1) remote missing, 2) local missing or
3) hashes are different.


 I was thinking about how this could potentially
 happen and I am wondering if the Java client pb interface supports R
 and W values, so I could make sure that a write goes in with W=(the
 number of nodes we have).


Doubt this will help with the concurrency problem I discussed above but it
will mean your application has a stronger guarantee of how many copies made
it to the nodes.  If you want to make sure they are durable then I would
use DW if Java exposes it [1].

[1]: See the optional query parameters for difference between W, DW, and
PW.
http://docs.basho.com/riak/latest/dev/references/http/store-object/#Request

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Search sort error in 2.0.0beta1

2014-05-09 Thread Ryan Zezeski
This seems like a bug to me.  I created an issue to track it.

https://github.com/basho/yokozuna/issues/372


On Thu, May 1, 2014 at 5:22 PM, Troy Melhase t...@troy.io wrote:

 Hello again!

 I've narrowed this down to the interaction between the sort parameter and
 the field list (fl) parameter.  It seems that if fl is supplied with
 sort, the field list must contain the value score.  I'm not certain if
 that's a bug or not, but the work-around is plain enough:  add score to
 the list of fields if there's a sort and the field list isn't empty.

 Whew!


 troy



 On Wed, Apr 30, 2014 at 10:09 PM, Troy Melhase t...@troy.io wrote:

 Hello!

 I'm getting an error when I include a sort parameter in
 a RpbSearchQueryReq message.  I'm using Riak 2.0.0beta1.  Source build and
 macos binaries show the same behavior.

 The error doesn't happen at all if I don't specify a search parameter.
  For the parameter value, I'm using field direction (e.g., name asc).
  Leaving off the direction, or encoding the space as + or %20 produces
 a Solr error.

 I've tried Golang and Python clients to see if it was a client issue.
 Both clients produce the exact same error; that error text is at the end of
 this message.

 Is this a known bug?  I searched Github and couldn't find any issues that
 look like this one.  Is there a work-around?  Or better yet, am I doing
 something wrong?

 Thanks!


 troy


 Error text:

 RiakError: 'Error processing incoming message:
 error:badarg:[{protobuffs,encode_internal,
   [2,[],float],

 [{file,src/protobuffs.erl},
{line,167}]},
  {riak_search_pb,iolist,2,
   [{file,

 src/riak_search_pb.erl},
{line,63}]},
  {riak_search_pb,encode,2,
   [{file,

 src/riak_search_pb.erl},
{line,48}]},
  {riak_pb_codec,encode,1,
   [{file,

 src/riak_pb_codec.erl},
{line,77}]},
  {yz_pb_search,encode,1,
   [{file,

 src/yz_pb_search.erl},
{line,60}]},
  {riak_api_pb_server,

 send_encoded_message_or_error,
   3,
   [{file,

 src/riak_api_pb_server.erl},
{line,498}]},
  {riak_api_pb_server,
   process_message,4,
   [{file,

 src/riak_api_pb_server.erl},
{line,430}]},
  {riak_api_pb_server,
   connected,2,
   [{file,

 src/riak_api_pb_server.erl},
{line,262}]}]'



 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: how to set eDisMax? Solr start and rows not working properly

2014-05-01 Thread Ryan Zezeski
On Sun, Apr 20, 2014 at 9:57 AM, Buri Arslon buri...@gmail.com wrote:

 Hi guys!

 I searched the docs and the source code but wasn't able to find any info
 about using edismax.

 I have 2 questions:

 1. How to set edismax parser?


You can make use of LocalParams syntax in order to use different query
parsers. For example:

{!edismax}my query

http://wiki.apache.org/solr/QueryParser
http://wiki.apache.org/solr/LocalParams



 2. Why are start and rows not working properly?


I'll get back to your sort question when I have a chance to verify on
my side.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak 2.0 search changes

2014-04-23 Thread Ryan Zezeski
Alexander,

 1. Does it support wildcards in a middle or start?  *abc, a*bc,

Riak Search 2.0 (Yokozuna) is based on Apache Solr. Any queries
supported by Solr's distributed search are supported by Search
2.0 over HTTP. The PB API has not been altered for Search 2.0 (with
the exception of presort) so if you want to use features like facets
you'll have to use HTTP for now.

https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

For wildcard searches in particular see the following section:

https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser#TheStandardQueryParser-SpecifyingTermsfortheStandardQueryParser

 2. Does presort supports any field instead of  key or score?

There is no presort option for Search 2.0. Presort is a work around
for sorting issues in the current Search [1,2]. Solr sorts
properly. Although, depending on the fields sorted, the results can
become inconsistent for the same query over time because of a bug in
Search 2.0 [3].

 I'm not found this in 2.0 docs.

These are the Riak Search 2.0.0beta1 docs.

http://docs.basho.com/riak/2.0.0beta1/dev/using/search/

-Z

1: https://github.com/basho/riak_search/pull/54
2:
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-May/004219.html
3: https://github.com/basho/yokozuna/issues/355
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak search fails to index via riakc_pb client

2014-03-31 Thread Ryan Zezeski

 Index via the erlang client (error)

 ===

 Eshell V5.9.2  (abort with ^G)
  1 {ok, Conn} = riakc_pb_socket:start_link(localhost, 8087).
 {ok,0.34.0}
 2 Body = {\name_s\:\tom\}.
 {\name_s\:\tom\}
 3 Object2Store = 
 riakc_obj:new({testtype,somebucket},1,Body).

 {riakc_obj,{testtype,somebucket},
2,undefined,[],undefined,
{\name_s\:\tom\}}
 4 ok = riakc_pb_socket:put(Conn,Object2Store).
 ok

  The object is stored (in riak), but not indexed (in solr):
  $ curl  http://localhost:8098/types/testtype/buckets/somebucket/keys/
 1
 {name_s:tom}

  == /../riak-yokozuna-0.14.0-src/rel/riak/log/console.log ==
  2014-03-31 16:26:11.568 [error] 0.1448.0@yz_kv:index:204 failed to
 index object {{testtype,somebucket},1} with error
 badarg because
 [{dict,fetch,[content-type,{dict,4,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[[dot|{35,9,254,249,83,57,16,45,{1,63563491571}}]],[],[],[],[],[],[[X-Riak-VTag,53,117,84,48,117,57,84,88,70,88,98,79,53,77,77,76,103,67,102,55,100,74]],[[index]],[],[[X-Riak-Last-Modified|{1396,272371,362491}]],[],[]}}}],[{file,dict.erl},{line,125}]},{yz_doc,extract_fields,1,[{file,src/yz_doc.erl},{line,99}]},{yz_doc,make_doc,5,[{file,src/yz_doc.erl},{line,71}]},{yz_doc,'-make_docs/4-lc$^0/1-0-',5,[{file,src/yz_doc.erl},{line,60}]},{yz_kv,index,7,[{file,src/yz_kv.erl},{line,249}]},{yz_kv,index,3,[{file,src/yz_kv.erl},{line,191}]},{riak_kv_vnode,actual_put,6,[{file,src/riak_kv_vnode.erl},{line,1391}]},{riak_kv_vnode,perform_put,3,[{file,src/riak_kv_vnode.erl},{line,1380}]}]


You failed to provide a content-type when building the object. It's
not easy to see if you aren't used to Erlang but the error in the log
shows a failure to find the key content-type in the object's
metadata.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: updating the default bucket-type to support Solr indexing?

2014-03-24 Thread Ryan Zezeski
Actually, a bucket type is not required. To associate an index with a
bucket you simply set the `search_index` property. Bucket types provide a
method for many buckets to inherit the same properties. Setting the
search_index property on the bucket type is a way to index multiple buckets
under one index without setting the property on each bucket. Otherwise, I
would suggest setting the property at the bucket level and not the type
level.

The specific problem Paul ran into is that he tried to change properties
for the default type which is a special type that cannot be altered.

-Z


On Mon, Mar 24, 2014 at 11:52 AM, Luke Bakken lbak...@basho.com wrote:

 Hi Paul,

 You are correct, a new bucket type must be created for Riak 2.0 search
 indexes.

 --
 Luke Bakken
 CSE
 lbak...@basho.com


 On Mon, Mar 24, 2014 at 2:27 AM, Paul Walk p...@paulwalk.net wrote:

 I'm experimenting with the technology preview of Riak 2.0, using an
 existing Ruby web application which uses the official Ruby client gem
 (1.4.3).

 My understanding is that if I have not specified particular bucket-types,
 then my buckets are implicitly using a 'default' bucket type. So, in order
 to try the all-new search functionality, I have tried to associate the
 default bucket-type with a search index, thus:

 ./riak-admin bucket-type update default
 '{props:{search_index:my_index}}'

 which returns the error:

 Error updating bucket type default: no_default_update

 Does this mean that in Riak 2.0, if one wants buckets to be indexed in
 Solr, one must create a new bucket_type in order to associate an index and
 then associate buckets with this?

 Thanks,

 Paul
 ---
 Paul Walk
 http://www.paulwalk.net
 ---






 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: updating the default bucket-type to support Solr indexing?

2014-03-24 Thread Ryan Zezeski
There is support for indexing CRDTs. Field definitions are defined in the
default schema:

https://github.com/basho/yokozuna/blob/develop/priv/default_schema.xml#L96


On Mon, Mar 24, 2014 at 12:33 PM, Paul Walk p...@paulwalk.net wrote:

 Thanks Luke.

 If I might be allowed a follow-on question, what is the effect of adding
 an index to a 'typed bucket-type'. For example, if I define a bucket_type
 as follows:

 ./riak-admin bucket-type create map_bucket_type
 '{props:{search_index:my_index,datatype:map}}'

 Are the members of any maps stored in a bucket which uses this bucket-type
 going to get indexed in Solr? I would assume that some sort of mashalling
 function and a custom schema would be required?

 Thanks,

 Paul

 On 24 Mar 2014, at 15:52, Luke Bakken lbak...@basho.com wrote:

  Hi Paul,
 
  You are correct, a new bucket type must be created for Riak 2.0 search
 indexes.
 
  --
  Luke Bakken
  CSE
  lbak...@basho.com
 
 
  On Mon, Mar 24, 2014 at 2:27 AM, Paul Walk p...@paulwalk.net wrote:
  I'm experimenting with the technology preview of Riak 2.0, using an
 existing Ruby web application which uses the official Ruby client gem
 (1.4.3).
 
  My understanding is that if I have not specified particular
 bucket-types, then my buckets are implicitly using a 'default' bucket type.
 So, in order to try the all-new search functionality, I have tried to
 associate the default bucket-type with a search index, thus:
 
  ./riak-admin bucket-type update default
 '{props:{search_index:my_index}}'
 
  which returns the error:
 
  Error updating bucket type default: no_default_update
 
  Does this mean that in Riak 2.0, if one wants buckets to be indexed in
 Solr, one must create a new bucket_type in order to associate an index and
 then associate buckets with this?
 
  Thanks,
 
  Paul
  ---
  Paul Walk
  http://www.paulwalk.net
  ---
 
 
 
 
 
 
  ___
  riak-users mailing list
  riak-users@lists.basho.com
  http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
 

 ---
 Paul Walk
 http://www.paulwalk.net
 ---






 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Search Index Not Found

2014-03-22 Thread Ryan Zezeski
On Sat, Mar 22, 2014 at 2:57 PM, Buri Arslon buri...@gmail.com wrote:

 another weird thing I noticed is that after I restart riak,
 get_search_index returns {ok, Index}, but after a few seconds, it's going
 back to {error, notfound}


Do you see any errors related to that index in the solr.log file?
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: How to disassociate a bucket with a yokozuna index?

2014-03-11 Thread Ryan Zezeski
On Tue, Mar 11, 2014 at 4:17 AM, EmiNarcissus eminarcis...@me.com wrote:

 Now I'm working with riak 2.0 pre17, have tried both set bucket property
 search_index to other index or _dont_index_, but still cannot delete the
 index.


 Failure: riakasaurus.exceptions.RiakPBCException: Can't delete index with
 associate buckets [riakasaurus.tests.test_search] (0)


What are the bucket properties for that bucket? I have a hunch of what it
might be but need to see the properties to verify.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: How to disassociate a bucket with a yokozuna index?

2014-03-10 Thread Ryan Zezeski
On Sun, Mar 9, 2014 at 8:31 AM, EmiNarcissus eminarcis...@me.com wrote:

 I'm testing on yokozuna api now, but found every time I call
 delete_search_index function it alerts cannot delete because of pre-existed
 associated bucket. I've tried to set the bucket search-index to another
 index, still have the same error.

 Is this part not implemented yet? or is something I missed from?


Hi Tim,

An index may not be deleted if it has any buckets associated with
it. The 'search_index' property (not 'search-index') must be changed
to either a different index or the sentinel value '_dont_index_'.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: [ANN] Yokozuna 0.14.0

2014-02-26 Thread Ryan Zezeski
It was just pointed out to me that the links in the INSTALL doc were wrong.
The packages have been moved off my s3 account into the main Basho S3
location.

http://s3.amazonaws.com/files.basho.com/yokozuna/pkgs/riak-yokozuna-0.14.0-src.tar.gz
http://s3.amazonaws.com/files.basho.com/yokozuna/pkgs/riak-yokozuna-0.14.0-src.tar.gz.sha1

The latest install doc has the corrected links.

https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md#source-package

-Z



On Mon, Feb 24, 2014 at 12:36 PM, Ryan Zezeski rzeze...@basho.com wrote:

 Riak Users,

 I'm happy to announce the Yokozuna 0.14.0 release. It brings an
 upgrade to Solr 4.6.1 as well as a slew of bug fixes and internal
 enhancements. There are breaking changes made in this release so if
 you are one of the brave souls using riak2.0.0 pre5/pre11 or a
 previous Yokozuna source release then a rolling upgrade to 0.14.0 may
 not go smoothly.


 https://github.com/basho/yokozuna/blob/b50470f89cb75069d3a80b99502f3f08cb307f58/docs/RELEASE_NOTES.md#0140


 https://github.com/basho/yokozuna/blob/b50470f89cb75069d3a80b99502f3f08cb307f58/docs/INSTALL.md

 -Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Yokozuna 0.14.0

2014-02-24 Thread Ryan Zezeski
Riak Users,

I'm happy to announce the Yokozuna 0.14.0 release. It brings an
upgrade to Solr 4.6.1 as well as a slew of bug fixes and internal
enhancements. There are breaking changes made in this release so if
you are one of the brave souls using riak2.0.0 pre5/pre11 or a
previous Yokozuna source release then a rolling upgrade to 0.14.0 may
not go smoothly.

https://github.com/basho/yokozuna/blob/b50470f89cb75069d3a80b99502f3f08cb307f58/docs/RELEASE_NOTES.md#0140

https://github.com/basho/yokozuna/blob/b50470f89cb75069d3a80b99502f3f08cb307f58/docs/INSTALL.md

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: yokozuna Issues

2014-02-19 Thread Ryan Zezeski
Hello Bryce,

On Wed, Feb 19, 2014 at 3:27 PM, Bryce Verdier bryceverd...@gmail.comwrote:

 Hey Hector,

 Thank you for looking into this, here is the response to 'java -version'
 on my machine:
 java -version
 java version 1.7.0_51
 OpenJDK Runtime Environment (fedora-2.4.5.0.fc19-x86_64 u51-b31)
 OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)

 I've been meaning to try buiding from source to see if the problem cropped
 up again, just haven't had the time yet. (I noticed that I didn't have the
 same issue when I built from source on my archLinux desktop --2.0pre14 --.
 Not sure if its related, but I just wanted to make sure).


 2014-02-13 08:59:56.225 [info] 0.547.0@yz_solr_proc:handle_info:134
solr stdout/err: Caused by: java.lang.UnsupportedClassVersionError:
com/basho/yokozuna/monitor/Monitor : Unsupported major.minor version 52.0

This is saying that com.basho.yokozuna.monitor.Monitor was compiled
with J2SE1.8. That will not work with your 1.7 JRE. If you compile
from source then you won't have this issue.

The problem is that Yokozuna has some custom Solr handlers and they
are compiled independently for each separate official Riak builder we
have. In this case our Fedora builder has javac 1.8.0-internal.

This is my fault. The importance of the compiling JDK and our builders
totally slipped my mind. Yokozuna needs to be changed so that we just
compile the JAR once and include it as part of the official build
process.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak crashing when indexing for search

2014-02-13 Thread Ryan Zezeski
Hi Glory,

On Tue, Feb 11, 2014 at 1:29 AM, Glory Lo gloryl...@gmail.com wrote:


 While indexing it seem to run fine part way.. then I noticed it hangs (it
 freezed my machine on a couple of attempts on linux mint 13).  Then it
 crashes.  I have 3 nodes running and I only tried indexing one of them
 doing a search-cmd mybucket dev1/data/leveldb


What was the process for indexing? How much data were you indexing? What
content-type? How big is each object? What is your schema?


 My crash log has multiple errors of different sorts which I haven't
 discern yet.  However, the last errors w/ a close timestamp are as follows
 which mentions some timeouts (likely with the freeze):


It's hard to discern ripple effect errors from the the origin error. I see
some stuff that is indicative of disk corruption but there's a good chance
that only happened because some other error caused merge_index to hard
crash. Could you attach a tar.gz of all your logs?



 2014-02-08 23:15:53 =ERROR REPORT
 Error in process 0.2799.1 on node 'dev1@127.0.0.1' with exit value:
 {badarg,[{ets,lookup,[145752322,{1118962191081472546749696200048404186924073353216,'
 dev2@127.0.0.1
 '}],[]},{riak_search_client,'-process_terms_1/4-fun-2-',3,[{file,src/riak_search_client.erl},{line,295}]},{riak_search_utils,'-ptransform/2-fun-0-',2,[{file,src/riak_search_utils


This is an error finding the temporary ETS table for building the postings
list. That's a really interesting error to have and makes me wonder if you
someone hit the ETS system limit. I'm not even sure that is possible given
how high we've raised the default limit.



 2014-02-08 23:18:46 =ERROR REPORT
 Error in process 0.2350.1 on node 'dev1@127.0.0.1' with exit value:
 {terminated,[{io,format,[17869.23.0,DEBUG: ~p:~p - ~p~n~n
 ~p~n~n,[riak_search_dir_indexer,194,{ error , Type , Error , erlang :
 get_stacktrace ( )
 },{error,error,{case_clause,{error,timeout}},[{riak_search_client,'-index_docs/1-fun-0-'...


I'm actually a bit baffled exactly what this trace is saying. I think more
detail might be in the error.log.



 2014-02-08 23:20:00 =ERROR REPORT
 Error in process 0.4231.1 on node 'dev1@127.0.0.1' with exit value:
 {{case_clause,{data,4711}},[{cpu_sup,get_uint32_measurement,2,[{file,cpu_sup.erl},{line,227}]},{cpu_sup,measurement_server_loop,1,[{file,cpu_sup.erl},{line,585}]}]}


Yikes, this looks really bad and makes me wonder if this is an environment
issue as this error should not be related to search.



 2014-02-08 23:23:37 =ERROR REPORT
 Error in process 0.6359.1 on node 'dev1@127.0.0.1' with exit value:
 {badarg,[{erlang,binary_to_term,[31359
 bytes],[]},{mi_segment,iterate_all_bytes,2,[{file,src/mi_segment.erl},{line,167}]},{mi_segment_writer,from_iterator,4,[{file,src/mi_segment_writer.erl},{line,102}]},{mi_segment_writer,from_iterator...


This is typically what you see when data corruption occurs but it's hard to
say if data corruption caused the other errors of the other errors caused
corruption.





 2014-02-08 23:24:58 =ERROR REPORT
 ** State machine 0.3211.0 terminating
 ** Last message in was {'EXIT',0.168.0,shutdown}
 ** When State == active
 **  Data  ==
 {state,1438665674247607560106752257205091097473808596992,riak_search_vnode,{vstate,1438665674247607560106752257205091097473808596992,merge_index_backend,{state,1438665674247607560106752257205091097473808596992,0.3212.0}},undefined,none,undefined,undefined,0.3221.0,{pool,riak_search_worker,2,[]},undefined,86616}
 ** Reason for termination =
 ** {timeout,{gen_server,call,[0.3212.0,stop]}}
 2014-02-08 23:24:58 =CRASH REPORT
   crasher:
 initial call: riak_core_vnode:init/1
 pid: 0.3211.0
 registered_name: []
 exception exit:
 {{timeout,{gen_server,call,[0.3212.0,stop]}},[{gen_fsm,terminate,7,[{file,gen_fsm.erl},{line,589}]},{proc_lib,init_p_do_apply,3,[{file,proc_lib.erl},{line,227}]}]}
 ancestors: [riak_core_vnode_sup,riak_core_sup,0.162.0]
 messages:
 [{'EXIT',0.3221.0,shutdown},{#Ref0.0.1.215952,ok},{'EXIT',0.3212.0,normal}]
 links: []
 dictionary: [{random_seed,{27839,21123,25074}}]
 trap_exit: true
 status: running
 heap_size: 46368
 stack_size: 24
 reductions: 24758
   neighbours:


This is one of the riak_search vnodes crashing because it's merge index
process crashed. Which is expected given the circumstances.


 2014-02-08 23:24:58 =ERROR REPORT
 ** State machine 0.5392.1 terminating
 ** Last message in was
 {'$gen_sync_all_state_event',{0.5390.1,#Ref0.0.1.215861},{shutdown,6}}
 ** When State == ready
 **  Data  == {state,{[],[]},0.5393.1,[],undefined}
 ** Reason for termination =
 ** {timeout,{gen_fsm,sync_send_all_state_event,[0.5393.1,stop]}}
 2014-02-08 23:24:58 =CRASH REPORT
   crasher:
 initial call: riak_core_vnode_worker_pool:init/1
 pid: 0.5392.1
 registered_name: []
 exception exit:
 

Re: Search schemas in 2.0pre11

2014-02-06 Thread Ryan Zezeski
On Tue, Feb 4, 2014 at 4:38 PM, Jeremy Pierre j.14...@gmail.com wrote:

 Hi Eric,

 Thanks very much - getting a 405 response for that curl command though.
  POST to same endpoint yields the following:


The schema resources does not accept POST requests. Only PUT and GET.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna and array of JSON documents

2014-02-06 Thread Ryan Zezeski
Hi Srdjan,


On Mon, Feb 3, 2014 at 12:06 PM, Srdjan Pejic spe...@gmail.com wrote:


 [{viewer_id_s=004615eb-5c0e-4c4a-890c-c6fc29e3fc56,
 video_time_i=475, type_s=joined},
 {viewer_id_s=635dcd2d-fdeb-46c1-9920-803ccdd6176b,
 video_time_i=522, type_s=joined},
 {viewer_id_s=04b3cec7-6f37-4840-b1b6-eff4c16dd273,
 video_time_i=159, type_s=joined},
 {viewer_id_s=6ce3da5f-b598-4b1c-abf0-38ba92fa15fb,
 video_time_i=393, type_s=upvote}]


 My question to you is how can I search this array of documents through
 Yokozuna/Solr? Currently, I get 0 results back, which I suspect is because
 the actual JSON data is nested in an array and Yokozuna doesn't index that
 in an expected way.


Assuming you are using the default schema, the issue is that you are using
non multi-valued fields and thus this data is failing to index. If you
check your console.log you should see errors with the string multiple
values encountered for non multiValued field in them. Try changing your
field names to the following:

viewer_id_s = viewer_id_ss
video_time_i = video_time_is
type_s = type_ss

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search and Yokozuna Backup Strategy

2014-01-27 Thread Ryan Zezeski
Hi Elias,


On Mon, Jan 27, 2014 at 2:40 PM, Elias Levy fearsome.lucid...@gmail.comwrote:



 Any comments on the backup strategy for Yokozuna?  Will it make use of
 Solr's Replication Handler, or something more lower level?  Will the node
 need to be offline to backup it up?


There is no use of any Solr replication code--at all. Yokozuna (new Riak
Search, yes I know the naming is confusing) can be thought of as secondary
data to KV. It is a collection of index postings based on the canonical and
authoritative KV data. Therefore, the postings can always be rebuilt from
the KV data. AAE provides an automatic integrity check between the KV
object and its postings that is run constantly in the background.

Given that, there are two ways I see backup/restore working.

1. From a local, file-level perspective. You take a snapshot of your node's
local filesystem and use that as a save point in case of future corruption.
In this case you don't worry yourself with cluster-wide consistency, it's
just a local backup. If you ever have to restore this data then AAE and
read-repair can deal with any divergence that is caused by using the
restore. Although, you could end up with resurrected data depending on your
delete policy and age of backup. Another issue is that various parts of
Riak that write to disk may not be snapshot safe. It's already been
discussed how leveldb isn't. I'm willing to bet Lucene isn't either. Any
case where a logical operation requires multiple filesystem writes you have
to worry about the snapshot occurring in the middle of the logical
operation. I have no idea how Lucene would deal with snapshots that occur
at the wrong time. I'm unsure how good it is at detecting, and more
importantly, recovering from corruption. This is one reason why AAE is so
important. I do demos at my talks where I literally rm -rf the entire index
dir and AAE rebuilds it from scratch. This will not necessarily be a fast
operation in a real production database but it's good to know that the data
can always be re-built from the KV data. If you can cover the KV data then
you can always rebuild the indexes.

2. Backup/restore as a logical operation in Riak itself. We currently have
a backup/restore but from what I hear it has various issues and needs to be
fixed/replaced. But, assuming there was a backup command that worked I
suppose you could try playing games with Yokozuna. Perhaps Yokozuna could
freeze an index from merging segments and backup important files. Perhaps
there are replication hooks built into Solr/Lucene that could be used. I'm
not sure. I'm handwaving on purpose because I'm sure there are multiple
avenues to explore. However, another option is to punt. As I said above the
indexes can be rebuilt from the KV data. So if you have a backup that only
works for KV then the restore operation would simply re-index the data as
it is written. Yokozuna currently uses a low-level hook inside the KV vnode
that notices any time that KV data is written so it should just work
assuming restore goes through the KV code path and doesn't build files
directly.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search and Yokozuna Backup Strategy

2014-01-27 Thread Ryan Zezeski
On Mon, Jan 27, 2014 at 4:02 PM, Elias Levy fearsome.lucid...@gmail.comwrote:


 So it would appear to do it properly, we'd need some support from Yokozuna
 to take the snapshot, return a list of files to backup or back them up
 itself (hard links?), and then to allow an application to signal it to
 release the snapshot or release it itself if its doing the backup.


If you want to do local, file-based backups, yes. It would appear Yokozuna
needs code added in order to backup the Lucene directories without issue.
In the interim there is still the option of only backing up the KV data and
rebuilding the indexes from that.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Timeouts on Riak Search

2014-01-27 Thread Ryan Zezeski
On Sun, Jan 26, 2014 at 7:41 PM, ender extr...@gmail.com wrote:

 I am continuously getting the following types of errors in my riak logs:

 2014-01-27 00:06:39.735 [error] 0.220.0 Supervisor riak_pipe_builder_sup
 had child undefined started with {riak_pipe_builder,start_link,undefined}
 at 0.18590.125 exit with reason
 {{modfun,riak_search,mapred_search,[Mediastream,(type:image
 type:video type:FacebookPost) AND (teamSlug:nba.san-antonio-spurs
 home_teamSlug:nba.san-antonio-spurs
 away_teamSlug:nba.san-antonio-spurs)]},error,{badmatch,{error,timeout}},[{riak_search,mapred_search,3,[{file,src/riak_search.erl},{line,55}]},{riak_kv_mrc_pipe,send_inputs,3,[{file,src/riak_kv_mrc_pipe.erl},{line,627}]},{riak_kv_mrc_pipe,'-send_inputs_async/3-fun-0-',3,[{file,src/riak_kv_mrc_pipe.erl},{line,557}]}]}
 in context child_terminated

 I have just started using Riak last week, so most likely it's user error
 on my part.  Would be grateful for any assistance!

 I have also attached some additional info (app.config, log files etc) to
 this email.

 Thanks,

 Satish


Satish,

I see you are doing a disjunction search across the type field `(type:image
type:video type:FacebookPost)`. How many documents match that sub-query? If
it is over 100k then legacy Riak Search will fail to return on the query
causing it to timeout. In general, legacy Riak Search has issues with
larger result sets. In 2.0 there is a new version of Riak Search (code name
Yokozuna) which should have much less issues with larger result sets. You
can try playing with it via the 2.0.0pre11 download.

http://docs.basho.com/riak/2.0.0pre11/downloads/

http://docs.basho.com/riak/2.0.0pre11/downloads/

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Timeouts on Riak Search

2014-01-27 Thread Ryan Zezeski
On Mon, Jan 27, 2014 at 5:20 PM, Ryan Zezeski rzeze...@basho.com wrote:



 I see you are doing a disjunction search across the type field `(type:image
 type:video type:FacebookPost)`. How many documents match that sub-query?
 If it is over 100k then legacy Riak Search will fail to return on the query
 causing it to timeout. In general, legacy Riak Search has issues with
 larger result sets. In 2.0 there is a new version of Riak Search (code name
 Yokozuna) which should have much less issues with larger result sets. You
 can try playing with it via the 2.0.0pre11 download.

 http://docs.basho.com/riak/2.0.0pre11/downloads/

 http://docs.basho.com/riak/2.0.0pre11/downloads/

 -Z


I meant to include these links as well:

https://github.com/basho/yokozuna#getting-started

https://github.com/basho/yokozuna/tree/develop/docs
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Split index with Riak 2.0 git (jan 15th) on a single dev node cluster

2014-01-21 Thread Ryan Zezeski
John,

1. What did you use to load the data? Do you have a script?

2. What content-type is the data?

3. Do you see any errors in the log directory? Check error.log and solr.log.

4. Do you get any results for the query q=_yz_err:1

5. Did you wait at least 1 second before running the queries?

6. What version of Riak are you using?

7. Are you by chance using curl to run these test queries? If so can you
please copy/paste or gist the entire curl input and output for each of the
3 different results?

-Z


On Tue, Jan 21, 2014 at 1:46 PM, John O'Brien j...@boardom.ca wrote:

 Issue:

 When running searchs against a single dev node cluster, pre-populated
 with 1000 keys, bitcask backend, search=on and a /search/svan?q=*
 search URI, the solr response is coming back with three different
 resultsets, 330 values, the other 354, the other 345. The range of
 keys 0-1000 are split in no obvious pattern between the 3 result
 shards..

 Anyone have any clue as to what I may have messed up in the config? I
 assume this is not expected behaviour.

 Other than that, it works great. ;)

 Cheers,

 John

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Yokozuna 0.13.0

2014-01-09 Thread Ryan Zezeski
Riak Users,

It was a little late due to the holidays but Yokozuna 0.13.0 is
here. This release brings an upgrade to Solr, support for indexing
Riak Data Structures, the ability to reload indexes via `riak attach`,
and includes a query performance boost. See the release notes for more
details.

https://github.com/basho/yokozuna/blob/develop/docs/RELEASE_NOTES.md#0130

Given the number of breaking changes since the Riak 2.0.0pre5 release
I recommended using the 0.13.0 source package until a new Riak
pre-release is made. This way the documentation can be followed without
trouble. See the install instructions for more detail.

https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md#source-package

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Cluster restarted and doesn’t respond to queries

2013-12-23 Thread Ryan Zezeski
On Mon, Dec 23, 2013 at 12:35 PM, Justin Lambert jlamb...@letsevenup.comwrote:


 I do see some errors in the error.log, but the referenced directories
 don’t appear to exist:
 2013-12-23 16:52:33.290 [error]
 0.1721.0@riak_kv_bitcask_backend:move_unused_dirs:607 Failed to move
 unused data directory
 ./data/leveldb/388211372416021087647853783690262677096107081728. Reason:
 eexist


This error is interesting. It is coming from the bitcask backend but trying
to read a leveldb directory. My guess is something happened with your
configuration when you upgraded. What does your app.config look like?

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: [Confusing search docs] Enabling search on bucket in Riak 2.0

2013-11-27 Thread Ryan Zezeski
If a field isn't specified then it will default to 'text'. Which should
work for plain text. But just as a sanity check I'd also be curious to see
the results of the *:* query.


On Wed, Nov 27, 2013 at 11:15 AM, Eric Redmond eredm...@basho.com wrote:

 That is not a valid solr query. You need to search by field:value. Try:

 http://192.168.1.10:8098/solr/logs/select?q=*:*

 Eric

 On Nov 27, 2013, at 7:23 AM, Kartik Thakore kthak...@aimed.cc wrote:

 Cool. I did the data activate and emptied out the bucket and set the props
 and created a different index. Still no go

 Here is the data:
 [2013-11-27T15:21:30] [ERROR] [192.168.1.102] [zach.scratchd.ca] [0] [
 thakore.kar...@gmail.com] test

 Here is the search:
 http://192.168.1.10:8098/solr/logs/select?q=*

 http://192.168.1.10:8098/solr/logs/select?q=test


 No results found


 On Tue, Nov 26, 2013 at 8:56 PM, Ryan Zezeski rzeze...@basho.com wrote:

 Kartik,

 The pre7 tag incorporates the new bucket type integration. Bucket types
 are a new feature in 2.0 that provide additional namespace support and more
 efficient bucket properties (good for when you have many buckets with
 custom properties). The particular code you are running against requires
 that for data to be indexed in Yokozuna it must be stored under a
 non-default bucket type. Since you are not specifying a type the logs
 bucket lives under the default type where `yz_index` will not be applied.
 This will be changed for 2.0 so that any type of bucket may be indexed. In
 the meantime, try this:

 riak-admin bucket-type create data '{props:{}}'
 riak-admin bucket-type activate data

 curl -X PUT -H 'content-type: application/json' '
 http://host:port/types/data/buckets/logs/props' -d
 '{props:{yz_index:allLogs}}'

 That above will change soon as well. We are attempting to rename most
 user facing parts of Yokozuna to just search. This means that `yz_index`
 will soon become `search_index`. Sorry for the inconvenience as things are
 in a bit of flux leading up to 2.0.

 -Z


 On Tue, Nov 26, 2013 at 5:59 PM, Kartik Thakore kthak...@aimed.ccwrote:

 So finally got a chance to try this and I am running into issues (I am
 on Riak 2.0pre7) btw.

 I have yz turned on:

 http://192.168.1.10:8098/yz

 I created the index with:

 $ curl -i  http://192.168.1.10:8098/yz/index/allLogs

 HTTP/1.1 200 OK
 Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained)
 Date: Tue, 26 Nov 2013 22:51:15 GMT

 Content-Type: application/json
 Content-Length: 41

 {name:allLogs,schema:_yz_default}


 And associated the search to the bucket probs:

 http://192.168.1.10:8098/buckets/logs/props

 {

 props:
 {

 allow_mult: true,
 basic_quorum: false,
 big_vclock: 50,
 chash_keyfun:
 {

 mod: riak_core_util,
 fun: chash_std_keyfun

 },
 dw: quorum,
 last_write_wins: false,
 linkfun:
 {

 mod: riak_kv_wm_link_walker,
 fun: mapreduce_linkfun

 },
 n_val: 3,
 name: logs,
 notfound_ok: true,
 old_vclock: 86400,
 postcommit: [ ],
 pr: 0,
 precommit: [ ],
 pw: 0,
 r: quorum,
 rw: quorum,
 small_vclock: 50,
 w: quorum,
 young_vclock: 20,
 yz_index: allLogs

 }

 }

 I put in a text/plain entry with:


 http://192.168.1.10:8098/riak/logs/26-11-2013T2260?vtag=47ffPuWSln7VhlTl02raJA

 [2013-11-26T22:43:26] [ERROR] [192.168.1.102] [0] [
 thakore.kar...@gmail.com] test


 http://192.168.1.10:8098/riak/logs/26-11-2013T2260?vtag=6IYwwPE27eUbs8ThaSOcTC

 [2013-11-26T22:39:59] [ERROR] [192.168.1.102] [0] [
 thakore.kar...@gmail.com] test



 But when I search:

 http://192.168.1.10:8098/search/allLogs?q=*

 No results

 http://192.168.1.10:8098/search/allLogs?q=test

 No results


 Whats going on?








 On Thu, Nov 21, 2013 at 12:45 PM, Ryan Zezeski rzeze...@basho.com
 wrote:
 
 
 
 
  On Wed, Nov 20, 2013 at 3:48 PM, Kartik Thakore kthak...@aimed.cc
 wrote:
 
  Thank you.
 
  I am creating indexes with:
 
  curl -i -XPUT http://192.168.1.10:8098/yz/index/allLogs \
 -H 'content-type: application/json' \
-d '{schema : _yz_default, bucket : logs }'
 
 
  But when I check the index with:
 
   curl -i  http://192.168.1.10:8098/yz/index/allLogs
 
  It drops the bucket association
 
  HTTP/1.1 200 OK
  Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained)
  Date: Wed, 20 Nov 2013 20:45:21 GMT
  Content-Type: application/json
  Content-Length: 41
 
  {name:allLogs,schema:_yz_default}
 
 
  Sorry, that documentation is out of date.
 
  To associate an index to a bucket you need to set the bucket's
 properties.
 
  curl -XPUT -H 'content-type: application/json' '
 http://localhost:8098/buckets/logs/props' -d
 '{props:{yz_index:allLogs}}'
 
  You can perform a GET on that same resource to check the yz_index
 property is set.
 
 
  Also
 
  what is going on here
 
  curl -XPUT -H'content-type:application/json'
 http://localhost:8098/buckets/people/keys/me \
  -d'{ name_s : kartik }'
 
  Why not:
  curl -XPUT -H'content-type:application/json'
 http://localhost:8098/rial/people/me \
  -d'{ name_s : kartik

Re: [Confusing search docs] Enabling search on bucket in Riak 2.0

2013-11-26 Thread Ryan Zezeski
Kartik,

The pre7 tag incorporates the new bucket type integration. Bucket types are
a new feature in 2.0 that provide additional namespace support and more
efficient bucket properties (good for when you have many buckets with
custom properties). The particular code you are running against requires
that for data to be indexed in Yokozuna it must be stored under a
non-default bucket type. Since you are not specifying a type the logs
bucket lives under the default type where `yz_index` will not be applied.
This will be changed for 2.0 so that any type of bucket may be indexed. In
the meantime, try this:

riak-admin bucket-type create data '{props:{}}'
riak-admin bucket-type activate data

curl -X PUT -H 'content-type: application/json'
'http://host:port/types/data/buckets/logs/props'
-d '{props:{yz_index:allLogs}}'

That above will change soon as well. We are attempting to rename most user
facing parts of Yokozuna to just search. This means that `yz_index` will
soon become `search_index`. Sorry for the inconvenience as things are in a
bit of flux leading up to 2.0.

-Z


On Tue, Nov 26, 2013 at 5:59 PM, Kartik Thakore kthak...@aimed.cc wrote:

 So finally got a chance to try this and I am running into issues (I am on
 Riak 2.0pre7) btw.

 I have yz turned on:

 http://192.168.1.10:8098/yz

 I created the index with:

 $ curl -i  http://192.168.1.10:8098/yz/index/allLogs

 HTTP/1.1 200 OK
 Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained)
 Date: Tue, 26 Nov 2013 22:51:15 GMT

 Content-Type: application/json
 Content-Length: 41

 {name:allLogs,schema:_yz_default}


 And associated the search to the bucket probs:

 http://192.168.1.10:8098/buckets/logs/props

 {

 props:
 {

 allow_mult: true,
 basic_quorum: false,
 big_vclock: 50,
 chash_keyfun:
 {

 mod: riak_core_util,
 fun: chash_std_keyfun

 },
 dw: quorum,
 last_write_wins: false,
 linkfun:
 {

 mod: riak_kv_wm_link_walker,
 fun: mapreduce_linkfun

 },
 n_val: 3,
 name: logs,
 notfound_ok: true,
 old_vclock: 86400,
 postcommit: [ ],
 pr: 0,
 precommit: [ ],
 pw: 0,
 r: quorum,
 rw: quorum,
 small_vclock: 50,
 w: quorum,
 young_vclock: 20,
 yz_index: allLogs

 }

 }

 I put in a text/plain entry with:


 http://192.168.1.10:8098/riak/logs/26-11-2013T2260?vtag=47ffPuWSln7VhlTl02raJA

 [2013-11-26T22:43:26] [ERROR] [192.168.1.102] [0] [
 thakore.kar...@gmail.com] test


 http://192.168.1.10:8098/riak/logs/26-11-2013T2260?vtag=6IYwwPE27eUbs8ThaSOcTC

 [2013-11-26T22:39:59] [ERROR] [192.168.1.102] [0] [
 thakore.kar...@gmail.com] test



 But when I search:

 http://192.168.1.10:8098/search/allLogs?q=*

 No results

 http://192.168.1.10:8098/search/allLogs?q=test

 No results


 Whats going on?








 On Thu, Nov 21, 2013 at 12:45 PM, Ryan Zezeski rzeze...@basho.com wrote:
 
 
 
 
  On Wed, Nov 20, 2013 at 3:48 PM, Kartik Thakore kthak...@aimed.cc
 wrote:
 
  Thank you.
 
  I am creating indexes with:
 
  curl -i -XPUT http://192.168.1.10:8098/yz/index/allLogs \
 -H 'content-type: application/json' \
-d '{schema : _yz_default, bucket : logs }'
 
 
  But when I check the index with:
 
   curl -i  http://192.168.1.10:8098/yz/index/allLogs
 
  It drops the bucket association
 
  HTTP/1.1 200 OK
  Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained)
  Date: Wed, 20 Nov 2013 20:45:21 GMT
  Content-Type: application/json
  Content-Length: 41
 
  {name:allLogs,schema:_yz_default}
 
 
  Sorry, that documentation is out of date.
 
  To associate an index to a bucket you need to set the bucket's
 properties.
 
  curl -XPUT -H 'content-type: application/json' '
 http://localhost:8098/buckets/logs/props' -d
 '{props:{yz_index:allLogs}}'
 
  You can perform a GET on that same resource to check the yz_index
 property is set.
 
 
  Also
 
  what is going on here
 
  curl -XPUT -H'content-type:application/json'
 http://localhost:8098/buckets/people/keys/me \
  -d'{ name_s : kartik }'
 
  Why not:
  curl -XPUT -H'content-type:application/json'
 http://localhost:8098/rial/people/me \
  -d'{ name_s : kartik }'
 
 
 
  In Riak 1.0.0 we changed the resource from '/riak/bucket/key' to
 '/buckets/bucket/keys/key'. We were supposed to deprecate and
 eventually remove the old resource but we never did. You can still use the
 old style but I would recommend using the new style as it is what we use in
 official docs and there is a chance perhaps the old resources don't stay up
 to date with the latest features.
 
 
  -Z

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Search Crashes

2013-11-21 Thread Ryan Zezeski
On Wed, Nov 20, 2013 at 2:38 PM, Gabriel Littman g...@connectv.com wrote:


 1) We are installed via deb package
  ii  riak 1.4.1-1
 Riak is a distributed data store


There's a 1.4.2 out but your issue doesn't seem to have anything do with a
specific 1.4.1 bug.



 2) We did recently upgrade our to riak python library 2.0 but I also have
 a cluster still on the 1.4 client that has similar problems.


Okay, so for now we assume the client upgrade didn't cause the issues
either.



 3) We less recently upgraded riak itself from 1.2.x to 1.4.  We ended up
 starting with an empty riak store in the processes.  Honestly we've had
 many problems with search index even under 1.2.  Mostly riak would get into
 a state where it would continuously crash after startup until we
 deleted /var/lib/riak/merge_index on the node and then rebuilt the search
 index via read/write.  The particular problems I'm having now I cannot
 confirm if they were happening under riak 1.2 or not.


The 1.2 issues may very well have been caused by a corruption bug that was
fixed in 1.4.0 [1].



 looks like allow_mult is false, but I just confirmed with my colleague
 that *it was previously set to true* so it could be that we have a hold
 over issue from that.
 $ curl 'http://10.1.2.95:8098/buckets/ctv_tvdata/props'

 {props:{allow_mult:false,basic_quorum:false,big_vclock:50,chash_keyfun:{mod:riak_core_util,fun:chash_std_keyfun},dw:0,last_write_wins:false,linkfun:{mod:riak_kv_wm_link_walker,fun:mapreduce_linkfun},n_val:3,name:ctv_tvdata,notfound_ok:false,old_vclock:86400,postcommit:[],pr:0,precommit:[{fun:precommit,mod:riak_search_kv_hook},{mod:riak_search_kv_hook,fun:precommit}],pw:0,r:1,rw:1,search:true,small_vclock:50,w:1,young_vclock:20}}


So after setting allow_mult back to false you'd have to make sure to
resolve any siblings but that should be done automatically for you now that
allow_mult is false again. However, the commit hook will also crash if you
have allow_mult set to true on Riak Search's special proxy object bucket.
Looking at your original insert crash message I notice the problem is
actually with the proxy objets stored in this bucket [2]. What does the
following curl show you:

curl 'http://host:post/buckets/_rsid_ctv_tvdata/props'

I bet $5 it has allow_mult set to true. Try setting that to false and see
what happens.




 Since it is now set to false now would you have a suggestion on how to
 clear the problem?  (Delete merge_index?)


You shouldn't have to delete merge index files unless they are corrupted.
Let's see if we can fix your insert/index problem first. Then we can work
on search if it is still broken.

-Z


[1]: https://github.com/basho/merge_index/pull/30

[2]: It's not easy to see by there is the atom 'riak_idx_doc' which
indicates this is a proxy object created by Riak Search. If you squint
hard enough you can see the analyzed fields as well. I should have looked
more closely the first time. This is not an obvious error. I wouldn't
expect many people to pick up on it.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: [Confusing search docs] Enabling search on bucket in Riak 2.0

2013-11-21 Thread Ryan Zezeski
On Wed, Nov 20, 2013 at 3:48 PM, Kartik Thakore kthak...@aimed.cc wrote:

 Thank you.

 I am creating indexes with:

 curl -i -XPUT http://192.168.1.10:8098/yz/index/allLogs \
-H 'content-type: application/json' \
   -d '{schema : _yz_default, bucket : logs }'


 But when I check the index with:

  curl -i  http://192.168.1.10:8098/yz/index/allLogs

 It drops the bucket association

 HTTP/1.1 200 OK
 Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained)
 Date: Wed, 20 Nov 2013 20:45:21 GMT
 Content-Type: application/json
 Content-Length: 41

 {name:allLogs,schema:_yz_default}


Sorry, that documentation is out of date.

To associate an index to a bucket you need to set the bucket's properties.

curl -XPUT -H 'content-type: application/json' '
http://localhost:8098/buckets/logs/props' -d
'{props:{yz_index:allLogs}}'

You can perform a GET on that same resource to check the yz_index property
is set.


 Also

 what is going on here

 curl -XPUT -H'content-type:application/json'
 http://localhost:8098/buckets/people/keys/me \
 -d'{ name_s : kartik }'

 Why not:
 curl -XPUT -H'content-type:application/json'
 http://localhost:8098/rial/people/mehttp://localhost:8098/buckets/people/keys/me
  \
 -d'{ name_s : kartik }'



In Riak 1.0.0 we changed the resource from '/riak/bucket/key' to
'/buckets/bucket/keys/key'. We were supposed to deprecate and
eventually remove the old resource but we never did. You can still use the
old style but I would recommend using the new style as it is what we use in
official docs and there is a chance perhaps the old resources don't stay up
to date with the latest features.


-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search Map Reduce error

2013-11-20 Thread Ryan Zezeski
Roger,

Riak Search has a hardcoded max result set size of 100K items. It enforces
this to prevent blowing out memory and causing other issues. Riak Search
definitely has some issues when it comes to handling a use case like yours.

That said, our new Search solution in 2.0 (code named Yokozuna) should do a
lot better. Not only does it not have the hardcoded 100K limit but it
should also execute the queries faster. In some cases by 1-3 orders of
magnitude (10-1000x). At that point you're more likely to be slowed down by
the map-reduce. You might even be able to remove that stage by using stored
fields, but I'd need to know more about your use case.

I agree that current Riak (pre 2.0) is not a general search solution. Riak
Search can work very well but it requires some hand holding and careful
vigilance of how you index and query the data. I feel that the new Search
(Yokozuna) fixes this in many ways. In general, it has more robust search
support and lower, more consistent latency. Yokozuna would also have no
issues dealing with 1 million objects. My micro benchmark that I run is
1-10 million objects. Granted, they are small plain-text objects, but I'm
fairly confident it would work with your 1 million objects.

I realize that Riak 2.0, and thus the new search functionality, is not out
yet. We have an early release, Riak 2.0.0pre5 [1], that you can try. I also
do monthly releases of the new search functionality [2]. So if you want to
kick the tires I can point you in the right direction.

-Z

[1]: http://docs.basho.com/riak/2.0.0pre5/downloads/

[2]: https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md


On Wed, Nov 20, 2013 at 11:45 AM, Roger Diller 
ro...@flexrentalsolutions.com wrote:

 I could dig up all our nitty gritty Riak details but I don't think that
 will help really.

 The point I think is this: Using search map reduce is not a viable way to
 do real time search queries. Especially ones that may have 2000+ plus
 results each. Couple that with search requests coming in every few seconds
 from 300+ customer app instances and you literally bring Riak to it's
 knees.

 Not that Riak is the problem really, it's just we are using it in a way it
 was not designed for. In essence, we are using Riak as a search engine for
 our application data. Correct me if I'm wrong but Riak is more for storing
 large amounts of KV data, but not really for finding that data in a search
 sense.

 Am I missing something here? Is there a viable way for doing real time
 search queries on a bucket with 1 million keys?


 On Mon, Nov 18, 2013 at 5:29 PM, Alexander Sicular sicul...@gmail.comwrote:

 More info please...

 Version
 Current config
 Hardware
 Data size
 Search Schema
 Etc.

 But I would probably say that your search is returning too many keys to
 your mr. More inline.

 @siculars
 http://siculars.posthaven.com

 Sent from my iRotaryPhone

 On Nov 18, 2013, at 13:59, Roger Diller ro...@flexrentalsolutions.com
 wrote:

 Using the Riak Java client, I am executing a search map reduce like this:

 MapReduceResult result = riakClient.mapReduce(SEARCH_BUCKET,
 search).execute();


 ^is this part a typo. Cause otherwise it looks like you do a smr, set
 the search and then another smr.


 String search = systemId: + systemName +  AND indexId: + indexId;

 MapReduceResult result = riakClient.mapReduce(SEARCH_BUCKET,
 search).execute();

 This worked fine when the bucket contained a few thousand keys. Now that
 we have far more data stored in the bucket (at least 250K keys), it's
 throwing this generic error:

 com.basho.riak.client.RiakException: java.io.IOException:
 {error:map_reduce_error}

 We've also noticed that storing new key/values in the bucket has slowed
 WAY down.

 Any idea what's going on?


 Your data set is incorrectly sized to your production config.

 Are there limitations to Search Map Reduce?


 Certainly

 Are there configuration options that need changed?


 Possibly

 Any help would be greatly appreciated.


 --
 Roger Diller
 Flex Rental Solutions, LLC
 Email: ro...@flexrentalsolutions.com
 Skype: rogerdiller
 Time Zone: Eastern Time

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




 --
 Roger Diller
 Flex Rental Solutions, LLC
 Email: ro...@flexrentalsolutions.com
 Skype: rogerdiller
 Time Zone: Eastern Time

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Search Crashes

2013-11-20 Thread Ryan Zezeski
Hi Gabriel,

First, let me verify a few things.

1. You are on Riak 1.4? Which patch version? 1.4.2?

2. You recently upgraded you client? Did you have any of these failures
before upgrading the client?

3. Have you made any other changes between the time your system was working
and the time it started exhibiting these failures? For example, set
allow_mult=true?

Given that you are having 'badmatch' hook crashes during insert I have the
suspicion that allow_mult was recently changed to true as the Riak Search
hook cannot deal with siblings. What does the following curl show:

curl 'http://host:port/buckets/ctv_tvdata/props'

If that has 'allow_mult: true' then that is your issue.

As for your search operations. I'm not sure why they are failing. If you
want you could tar.gz all the logs for each node and email that to me.

-Z


On Mon, Nov 18, 2013 at 7:00 PM, Gabriel Littman g...@connectv.com wrote:

 Hi All,

 We've been working with a search enabled bucket in riak for a while now
 and off and on it has been giving us trouble.  In the past it has been
 solved by reindexing all the data by just reading and writing the data back
 into riak.  But even this is failing now on some input data.  Any
 help/insite would be greatly appreciated.

 We are on riak 1.4
 We have recently switched to riak python api 2.0

 smrtv@fre-prod-svr15:~$ python
 Python 2.7.3 (default, Aug  1 2012, 05:14:39)
 [GCC 4.6.3] on linux2
 Type help, copyright, credits or license for more information.
  import riak
  r = riak.RiakClient()
  b = r.bucket('ctv_tvdata')
  o = b.get('/data/v2/search_show/TMS.Show.9838380')
  o.data
 {'type': 'show', 'expires': '99', 'subject_name': 'Monsters vs.
 Aliens', 'sub_type': 'Series', 'topic':
 '__ref--/data/v2/topic/TMS.Show.9838380:r1384276501.854346', 'person':
 '__None__', 'searchable_key': 'aliens vs monstersvsaliens monsters',
 'date': '2013-11-23', 'sport': '__None__', 'genre': 'Children', 'id':
 '/data/v2/search_show/TMS.Show.9838380'}
  o.store()
 Traceback (most recent call last):
   File stdin, line 1, in module
   File /usr/local/lib/python2.7/dist-packages/riak/riak_object.py, line
 281, in store
 timeout=timeout)
   File /usr/local/lib/python2.7/dist-packages/riak/client/transport.py,
 line 127, in wrapper
 return self._with_retries(pool, thunk)
   File /usr/local/lib/python2.7/dist-packages/riak/client/transport.py,
 line 69, in _with_retries
 return fn(transport)
   File /usr/local/lib/python2.7/dist-packages/riak/client/transport.py,
 line 125, in thunk
 return fn(self, transport, *args, **kwargs)
   File /usr/local/lib/python2.7/dist-packages/riak/client/operations.py,
 line 289, in put
 timeout=timeout)
   File
 /usr/local/lib/python2.7/dist-packages/riak/transports/http/transport.py,
 line 144, in put
 return self._parse_body(robj, response, [200, 201, 204, 300])
   File
 /usr/local/lib/python2.7/dist-packages/riak/transports/http/codec.py,
 line 64, in _parse_body
 self.check_http_code(status, expected_statuses)
   File
 /usr/local/lib/python2.7/dist-packages/riak/transports/http/transport.py,
 line 446, in check_http_code
 (expected_statuses, status))
 Exception: Expected status [200, 201, 204, 300], received 500

 Using protocol buffs gives an erlang riak_search_kv_hook,precommit,error:

  r = riak.RiakClent(protocol='pcb')
 Traceback (most recent call last):
   File stdin, line 1, in module
 AttributeError: 'module' object has no attribute 'RiakClent'
  r = riak.RiakClient(protocol='pcb')
 Traceback (most recent call last):
   File stdin, line 1, in module
   File /usr/local/lib/python2.7/dist-packages/riak/client/__init__.py,
 line 99, in __init__
 self.protocol = protocol or 'http'
   File /usr/local/lib/python2.7/dist-packages/riak/client/__init__.py,
 line 118, in _set_protocol
 repr(self.PROTOCOLS))
 ValueError: protocol option is invalid, must be one of ['http', 'https',
 'pbc']
  r = riak.RiakClient(protocol='pbc')
  b = r.bucket('ctv_tvdata')
  o = b.get('/data/v2/search_show/TMS.Show.9838380')
  o.store()
 Traceback (most recent call last):
   File stdin, line 1, in module
   File /usr/local/lib/python2.7/dist-packages/riak/riak_object.py, line
 281, in store
 timeout=timeout)
   File /usr/local/lib/python2.7/dist-packages/riak/client/transport.py,
 line 127, in wrapper
 return self._with_retries(pool, thunk)
   File /usr/local/lib/python2.7/dist-packages/riak/client/transport.py,
 line 69, in _with_retries
 return fn(transport)
   File /usr/local/lib/python2.7/dist-packages/riak/client/transport.py,
 line 125, in thunk
 return fn(self, transport, *args, **kwargs)
   File /usr/local/lib/python2.7/dist-packages/riak/client/operations.py,
 line 289, in put
 timeout=timeout)
   File
 /usr/local/lib/python2.7/dist-packages/riak/transports/pbc/transport.py,
 line 194, in put
 MSG_CODE_PUT_RESP)
   File
 /usr/local/lib/python2.7/dist-packages/riak/transports/pbc/connection.py,
 line 43, in _request
   

Re: Riak Yokozuna and a schema

2013-11-15 Thread Ryan Zezeski
Leif,

I quickly wrote up a gist to show how you can use a custom schema with your
index and associate it with multiple buckets. Be warned that the current
version of Riak/Yokozuna uses bucket properties for storing the index
association. These are stored in the ring and have a known limitation. The
next version (0.12.0) will use a much more efficient mechanism for storing
association but will also change some of the steps outlined in that gist.
The Yokozuna API is still a bit of a moving target leading up to the Riak
2.0 Final release.

https://gist.github.com/rzezeski/7488192

-Z


On Fri, Nov 15, 2013 at 7:11 AM, Leif Gensert l...@propertybase.com wrote:

 Hello everyone,

 I am currently evaluating Riak for a project of ours.

 Here are the requirements in a nutshell:

 - We get various customer data as json with different field names (let’s
 just pretend that we have books).
 - We need to store these data as it comes (JSON with the original field
 names).
 - We need to have a consistent search index with fields specified by us.

 Example:

 Customer A:

 {
   book_title: ‘Alice in wonderland’,
   num_of_pages: 314,
 }

 Customer B:

 {
   book_name: ’Sherlock Holmes in the Hound of the Baskervilles’,
   number_of_pages: 164,
 }

 So far so good.

 This data need to be stored for example like this:

 {
   title: ‘Alice in wonderland’,
   pages: 314,
 }
 {
   title: ’Sherlock Holmes in the Hound of the Baskervilles’,
   pages: 164,
 }

 My thought was this:

 - Store data from each customer to a different bucket.
 - Index the data from the document to Yokozuna (after all, Solr has a
 schema so we could utilize that)

 My question concerning this would be:

 What’s the best way to do this?

 So far the only tutorials I found concerning Yokozuna index the documents
 without a schema.

 best
 Leif
 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna Schema Changes

2013-11-08 Thread Ryan Zezeski
Hi Jeremiah,

Yes. I very much want the ability to update the schema in 2.0. More
fundamental things have leap frogged it. Technically you can modify a
schema today but it has to be done by hand and is error prone.

-Z


On Fri, Nov 8, 2013 at 6:42 PM, Jeremiah Peschka jeremiah.pesc...@gmail.com
 wrote:

 I notice that YZ issue 130 (support for schema updates) was created 5
 months ago and doesn't have any commits against it right now. Is this still
 on track to get pushed into the product as part of Riak 2.0 or has no work
 begun?

 Thanks
 ---
 Jeremiah Peschka - Founder, Brent Ozar Unlimited
 MCITP: SQL Server 2008, MVP
 Cloudera Certified Developer for Apache Hadoop

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna Schema Changes

2013-11-08 Thread Ryan Zezeski
Absolutely not ideal. However, adding the ability to more easily mutate the
schema will come with a cost. Adding a field that wasn't there before and
you only want it indexed for newly written objects, easy. Adding a field
and you want to re-index your objects, little tricker. Removing a field
that has hundreds, thousands, millions, etc of matching Solr documents;
better be careful. Changing the field type or analysis chain; now you are
asking for serious trouble.

I still plan to add the feature but mutating a schema must be done with
caution. I will probably just end up writing a bunch of scary documentation
to warn of the pitfalls :)

-Z


On Fri, Nov 8, 2013 at 6:56 PM, Jeremiah Peschka jeremiah.pesc...@gmail.com
 wrote:

 Yeah, I ran into some difficulties while trying to modify schema. Even
 after modifications I ended up having to do a rolling restart of the
 cluster to get YZ to pick up the new schema.

 Obviously a rolling restart of Riak isn't the biggest issue on earth, it's
 not ideal either.

 ---
 Jeremiah Peschka - Founder, Brent Ozar Unlimited
 MCITP: SQL Server 2008, MVP
 Cloudera Certified Developer for Apache Hadoop


 On Fri, Nov 8, 2013 at 3:54 PM, Ryan Zezeski rzeze...@basho.com wrote:

 Hi Jeremiah,

 Yes. I very much want the ability to update the schema in 2.0. More
 fundamental things have leap frogged it. Technically you can modify a
 schema today but it has to be done by hand and is error prone.

 -Z


 On Fri, Nov 8, 2013 at 6:42 PM, Jeremiah Peschka 
 jeremiah.pesc...@gmail.com wrote:

 I notice that YZ issue 130 (support for schema updates) was created 5
 months ago and doesn't have any commits against it right now. Is this still
 on track to get pushed into the product as part of Riak 2.0 or has no work
 begun?

 Thanks
 ---
 Jeremiah Peschka - Founder, Brent Ozar Unlimited
 MCITP: SQL Server 2008, MVP
 Cloudera Certified Developer for Apache Hadoop

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: [ANN] Yokozuna 0.11.0

2013-11-08 Thread Ryan Zezeski
As promised here is documentation on how to use the new security features
with Yokozuna. Unfortunately, I also found a bug while making this document
so not everything will work as described until the next release.
Essentially the HTTP search authorization will always fail since it's
checking against the schema resource type rather than the index resource
(but protocol buffers should work). Also, some of the terminology used in
this document may change in the next few months as we polish things.

https://github.com/basho/yokozuna/blob/develop/docs/SECURITY.md

-Z


On Thu, Nov 7, 2013 at 9:29 AM, Ryan Zezeski rzeze...@basho.com wrote:

 Riak Users,

 Today I'm happy to announce the 0.11.0 release of Yokozuna.

 This release brings Riak Java Client support as well as
 authentication and security for the HTTP and protocol buffer
 transports. An access control list (ACL) may be created to
 control administration and access to indexes. All official Riak
 clients should now have full support for Yokozuna's
 administration and search API. Stored boolean fields and tagging
 support were fixed for the protocol buffer transport. And
 finally, documentation was added. The new CONCEPTS document [1]
 goes over various important concepts in Yokozuna and the
 RESOURCES document [2] has links to other resources for
 learning. There isn't much documentation on specifically how to
 use the new security features besides in the pull request itself
 but I will rectify that soon with a security specific doc page.

 This release may confuse some people given that the Riak 2.0 Tech
 Preview (2.0.0pre5) [3] was just released last week. Why continue
 with separate Yokozuna releases? What is the difference? These
 questions are answered in the INSTALL document [4], but the short
 story is that Yokozuna runs on a monthly release cycle and
 therefore out paces official Riak releases. These monthly releases
 allow you to try the latest Yokozuna features and bug fixes
 without waiting for the next Riak release. These releases should
 never be used for production. See the INSTALL document for more
 information.

 In summary: If you need to test the latest features then use the
 Riak-Yokozuna source package. Otherwise just stick to the tech
 preview until the final Riak 2.0 package drops.

 Finally, the only feature in Riak-Yokozuna 0.11.0 not found in
 Riak 2.0.0pre5 is the security feature. See the release notes and
 install document for more details.

 https://github.com/basho/yokozuna/blob/develop/docs/RELEASE_NOTES.md#0110

 https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md

 -Z

 [1]: https://github.com/basho/yokozuna/blob/develop/docs/CONCEPTS.md

 [2]: https://github.com/basho/yokozuna/blob/develop/docs/RESOURCES.md

 [3]: http://docs.basho.com/riak/2.0.0pre5/downloads/

 [4]: https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Yokozuna 0.11.0

2013-11-07 Thread Ryan Zezeski
Riak Users,

Today I'm happy to announce the 0.11.0 release of Yokozuna.

This release brings Riak Java Client support as well as
authentication and security for the HTTP and protocol buffer
transports. An access control list (ACL) may be created to
control administration and access to indexes. All official Riak
clients should now have full support for Yokozuna's
administration and search API. Stored boolean fields and tagging
support were fixed for the protocol buffer transport. And
finally, documentation was added. The new CONCEPTS document [1]
goes over various important concepts in Yokozuna and the
RESOURCES document [2] has links to other resources for
learning. There isn't much documentation on specifically how to
use the new security features besides in the pull request itself
but I will rectify that soon with a security specific doc page.

This release may confuse some people given that the Riak 2.0 Tech
Preview (2.0.0pre5) [3] was just released last week. Why continue
with separate Yokozuna releases? What is the difference? These
questions are answered in the INSTALL document [4], but the short
story is that Yokozuna runs on a monthly release cycle and
therefore out paces official Riak releases. These monthly releases
allow you to try the latest Yokozuna features and bug fixes
without waiting for the next Riak release. These releases should
never be used for production. See the INSTALL document for more
information.

In summary: If you need to test the latest features then use the
Riak-Yokozuna source package. Otherwise just stick to the tech
preview until the final Riak 2.0 package drops.

Finally, the only feature in Riak-Yokozuna 0.11.0 not found in
Riak 2.0.0pre5 is the security feature. See the release notes and
install document for more details.

https://github.com/basho/yokozuna/blob/develop/docs/RELEASE_NOTES.md#0110

https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md

-Z

[1]: https://github.com/basho/yokozuna/blob/develop/docs/CONCEPTS.md

[2]: https://github.com/basho/yokozuna/blob/develop/docs/RESOURCES.md

[3]: http://docs.basho.com/riak/2.0.0pre5/downloads/

[4]: https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna: Riak Python client PB error with Solr stored boolean fields

2013-10-15 Thread Ryan Zezeski
Dave,

This is a bug in Yokozuna.  I have a hot-patch you can try.  I'll email you
directly with an attachment.

I created an issue as well.

https://github.com/basho/yokozuna/issues/209

-Z




On Mon, Oct 14, 2013 at 9:27 PM, Dave Martorana d...@flyclops.com wrote:

 I studied the problem I was having with using the Python client's
 .fulltext_search(...) method and got it down to this - it seems that I get
 an error when searching against Solr using the Python client's
 .fulltext_search(...) method (using protocol buffers) whenever I have a *
 stored* boolean field.

 In my schema, I have:

 field name=banned type=boolean indexed=true stored=true /

 With that (or any named field of type boolean that is set to
 stored=true) I receive the following stack trace:

 http://pastebin.com/ejCixPEZ

 In the error.log file on the server, I see the following repeated:

 2013-10-15 01:21:17.480 [error] 0.2872.0@yz_pb_search:maybe_process:95
 function_clause
 [{yz_pb_search,to_binary,[false],[{file,src/yz_pb_search.erl},{line,154}]},{yz_pb_search,encode_field,2,[{file,src/yz_pb_search.erl},{line,152}]},{lists,foldl,3,[{file,lists.erl},{line,1197}]},{yz_pb_search,encode_doc,1,[{file,src/yz_pb_search.erl},{line,144}]},{yz_pb_search,'-maybe_process/3-lc$^0/1-0-',1,[{file,src/yz_pb_search.erl},{line,76}]},{yz_pb_search,maybe_process,3,[{file,src/yz_pb_search.erl},{line,76}]},{riak_api_pb_server,process_message,4,[{file,src/riak_api_pb_server.erl},{line,383}]},{riak_api_pb_server,connected,2,[{file,src/riak_api_pb_server.erl},{line,221}]}]

 Does anyone have any insight? I'm not a Solr expert, so perhaps storing
 boolean fields for retrieval is not a good idea? I know if I index but
 don't store, I can still successfully search against a boolean value.

 Thanks!

 Dave



 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak search restore

2013-10-15 Thread Ryan Zezeski
Jon,

The schema file is stored in a special bucket '_rs_schema' as well as
cached in memory.

-Z


On Fri, Oct 4, 2013 at 2:33 AM, Jon Debonis j...@trov.com wrote:

 Hello,

 Riak includes these commands:

 search-cmd set-schema [INDEX] SCHEMAFILE
 search-cmd show-schema [INDEX]

 Once imported/loaded, where is this schema file stored? Is it in a bucket,
 or on the filesystem?

 Thanks
 Jon

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Yokozuna 0.10.0

2013-10-08 Thread Ryan Zezeski
Riak Users,

The 0.10.0 release of Yokozuna is ready.

This release brings a few features such as an upgrade in Solr version along
with some basic indexing and query stats. The default index has been
removed returning write performance closer to baseline for non-indexed
buckets. Disk usage was decreased by removing the default index and the
unused timestamp from the entropy data. Among the list of other fixes a
notable one is the improvement of Solr start-up and crash semantics. If
Solr crashes too frequently then Yokozuna will stop the local Riak node.

https://github.com/basho/yokozuna/blob/develop/docs/RELEASE_NOTES.md#0100

For installation instructions see the INSTALL doc.

https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md

The 0.11.0 release should be wrapped up towards the end of next week.

https://github.com/basho/yokozuna/issues?milestone=11state=open

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Yokozuna 0.9.0

2013-09-09 Thread Ryan Zezeski
Riak Users,

The ninth release of Yokozuna has arrived.  It is now integrated with the
Riak development branch.  This means no more special merge branches and you
get the latest and greatest Riak code.  It also means Yokozuna is on track
to be delivered with the next release of Riak.  There is now support for
index and schema administration over protocol buffers.  A major performance
regression was fixed.  An AAE deadlock issue was fixed.  And work has
started for Riak Search migration.  For a full list of changes see the
release notes.

https://github.com/basho/yokozuna/blob/develop/docs/RELEASE_NOTES.md#090

For installation instructions see the INSTALL doc.

https://github.com/basho/yokozuna/blob/develop/docs/INSTALL.md

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna and Spatial Search

2013-09-08 Thread Ryan Zezeski
Vincenzo,

I replied on the GitHub issue.

-Z


On Sun, Sep 8, 2013 at 8:23 AM, Vincenzo Vitale
vincenzo.vit...@gmail.comwrote:

 I got it working with this change to the default conf:
 https://github.com/basho/yokozuna/pull/169

 Before doing this, I first tried creating my own schema but put was
 hanging.


 V.


 On Sun, Sep 8, 2013 at 3:56 AM, Vincenzo Vitale vincenzo.vit...@gmail.com
  wrote:

 Hi,

 I'm trying to make SpatialSearch in my application working with Yokozuna.
 (develop branch, hash 601560bf9ea0859e598957c13733fbbb0e656e17 of the 6th
 september)

 The json object looks like this:


 {where:{latitude:7430019,longitude:4210023,geolocation_p:7.430019,4.210023},timestamp:2013-09-08T01:10:07.752Z}

  since there is already a dynamic field for *_p defined.

 But the query:

 http://127.0.0.1:8093/solr/my-index/select?q=*:*fq={!geofilt}spatial=truept=7.430019%2C4.210023sfield=where_geolocation_pd=1http://127.0.0.1:8093/solr/my-index/select?q=*:*fq=%7B!geofilt%7Dspatial=truept=7.430019%2C4.210023sfield=where_geolocation_pd=1

 returns the error:
 can not use FieldCache on multivalued field:
 where_geolocation_p_0_coordinate


 Looking at this:

 http://stackoverflow.com/questions/7068605/solr-spatial-search-can-not-use-fieldcache-on-multivalued-field

 it seems the problem is the missing parameter and the dynamic field
 declaration for *_coordinates in the configuration file.

 Is this the cause of the problem?

 The _yz_default.xml files in the data directory seems overwritten every
 time riak is restarted, is there a way to customize the solr configuration
 per bucket?


 Thanks in advance,
 Vincenzo.


 --
 If your e-mail inbox is out of control, check out
 http://sanebox.com/t/mmzve. I love it.




 --
 If your e-mail inbox is out of control, check out
 http://sanebox.com/t/mmzve. I love it.

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak core dumped, merge_index corruption?

2013-08-06 Thread Ryan Zezeski
Deyan,

What Riak version are you running?  There was a corruption issue discovered
and fixed in the 1.4.0 release.

https://github.com/basho/riak/blob/riak-1.4.0/RELEASE-NOTES.md#issues--prs-resolved
https://github.com/basho/merge_index/pull/30

As for fixing, you'll want to delete the buffer files for the partitions
which are having issues.  E.g. if you look in crash.log you'll see
partition numbers for the crashing vnodes.

 **  Data  ==
{state,685078892498860742907977265335757665463718379520,riak_search_vnode,undefined,undefined,none,undefined,undefined,undefined,undefined,0}

In the
/storage/riak/merge_index/685078892498860742907977265335757665463718379520
you'll see buffer files.  You'll want to delete those.  After deleting all
these bad buffers Riak Search should start fine.  You'll then want to
upgrade to 1.4.1 to avoid corruption in the future.  Finally, since you
have to delete the buffers you'll have missing indexes and you'll want to
re-index your data.

Since only one of your nodes experience corruption you can use the built-in
repair functionality to re-index only data for those partitions.  First
you'll want to attach to one of your nodes.  Then for each partition run
the following.

 riak_search_vnode:repair(P)

Make sure to run repair for only one partition at a time to avoid
overloading anything.

To determine when a repair is finished you can periodically call the
following.  Once it returns 'no_repair' that indicates it has finished.

 riak_search_vnode:repair_status(P)

Here is more information on the repair command.

http://docs.basho.com/riak/latest/cookbooks/Repairing-Search-Indexes/


-Z



On Tue, Aug 6, 2013 at 5:17 AM, Deyan Dyankov dyan...@cloudxcel.com wrote:

 hi,

 we have a 3 node cluster and one of the node crashed yesterday.
 Nodes are db1, db2 and db3. We started other services on db1 and db2 and
 db1 crashed. Currently db2 and db3 are fine, balanced, receiving writes and
 serving reads.
 However, db1 has issues starting. When I start the node, it outputs
 numerous errors and this finally results in a core dump. We use Riak search
 and this may be the reason for the dump. After starting the node, these are
 the first errors that are seen in the log file:

 […]
 2013-08-06 11:06:08.989 [info] 0.7.0 Application erlydtl started on node
 'r...@db1.locations.cxl-cdn.net'
 *2013-08-06 11:06:16.675 [warning] 0.5010.0 Corrupted posting detected
 in
 /storage/riak/merge_index/456719261665907161938651510223838443642478919680/buffer.598
 after reading 2281*
 *49 bytes, ignoring remainder.*
 2013-08-06 11:06:18.922 [error] 0.5310.0 CRASH REPORT Process 0.5310.0
 with 0 neighbours exited with reason: bad argument in call to
 erlang:binary_to_term(131,108,0,0,0,1,10
 4,4,104,3,109,0,0,0,25,99,120,108,101,118,101,110,116,115,95,99,97,107,101,...)
 in mi_buffer:read_value/2 line 162 in gen_server:init_it/6 line 328
 2013-08-06 11:06:20.751 [error] 0.5309.0 gen_fsm 0.5309.0 in state
 started terminated with reason: no function clause matching
 riak_search_vnode:terminate({{badmatch,{error,{b
 adarg,[{erlang,binary_to_term,[131,108,0,0,0,1,104,4,104,3,109,0,0,0,25,...],...},...]}}},...},
 undefined) line 233
 […]

 Attached is an archive of the /var/log/riak directory. The logs there are
 for the latest starting attempt. Riak core dumped in a minute or two after
 being started.
 Is there a way to fix the merge index corruption and start the node?

 thank you for your efforts,
 Deyan


 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Yokozuna 0.8.0

2013-08-05 Thread Ryan Zezeski
Riak Users,

The eighth release of Yokozuna is out.  It is now considered alpha and will
soon become part of Riak proper.  There could still be breaking
changes leading up to the 1.0.0 release which is currently scheduled
for early October.

The main things of interest this release are the re-target to Riak
1.4.0 and removal of a race condition around index creation.

See the release notes for more detail.

https://github.com/basho/yokozuna/blob/master/docs/RELEASE_NOTES.md#080

Here are install instructions.

https://github.com/basho/yokozuna/blob/master/docs/INSTALL.md#source-package

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Data population of Yokozuna on key-path in schema?

2013-07-18 Thread Ryan Zezeski
As Eric said, the XML extractor causes the nested elements to become
concatenated by an underscore.  Extractor is a Yokozuna term.  It is the
process by which a Riak Object is mapped to a Solr document.  In the case
of a Riak Object whose value is XML the XML is flattened by a)
concatenating nested elements with '_' and b) concatenating attributes with
'@' (this can be changed if necessary, just ask).  Yokozuna provides a
resource to test how a given object would be extracted.

curl -X PUT -i -h 'content-type: application/xml' 'http://host:port/extract'
--data-binary @some.xml

This will return a JSON representation of the field-values extracted from
the object.  You can use a json pretty printer like jsonpp to make it
easier to read.

-Z



On Wed, Jul 17, 2013 at 8:51 PM, Eric Redmond eredm...@basho.com wrote:

 That's correct. The XML extractor nests by element name, separating
 elements by an underscore.

 Eric

 On Jul 17, 2013, at 12:46 PM, Dave Martorana d...@flyclops.com wrote:

 Hi,

 I realize I may be way off-base, but I noticed the following slide in
 Ryan’s recent Ricon East talk on Yokozuna:

 http://cl.ly/image/3s1b1v2w2x12

 Does the schema pick out values based on key-path automatically? For
 instance,

 commitrepoval/repo.../commit

 automatically gets mapped to the “commit_repo field definition for the
 schema?

 Thanks!

 Dave
 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna kv write timeouts on 1.4 (yz-merge-1.4.0)

2013-07-18 Thread Ryan Zezeski
Dave,

I'm currently in the process re-targeting Yokozuna to 1.4.0 for the 0.8.0
release.  I'll ping this thread when the transition is complete.

-Z


On Wed, Jul 17, 2013 at 8:53 PM, Eric Redmond eredm...@basho.com wrote:

 Dave,

 Your initial line was correct. Yokozuna is not yet compatible with 1.4.

 Eric

 On Jul 15, 2013, at 1:00 PM, Dave Martorana d...@flyclops.com wrote:

 Hi everyone. First post, if I leave anything out just let me know.

 I have been using vagrant in testing Yokozuna with 1.3.0 (the official
 0.7.0 “release) and it runs swimmingly. When 1.4 was released and someone
 pointed me to the YZ integration branch, I decided to give it a go.

 I realize that YZ probably doesn’t support 1.4 yet, but here are my
 experiences.

 - Installs fine
 - Using default stagedevrel with 5 node setup
 - Without yz enabled in app.config, kv accepts writes and reads
 - With yz enabled on dev1 and nowhere else, kv accepts writes and reads,
 creates yz index, associates index with bucket, does not index content
 - With yz enabled on 4/5 nodes, kv stops accepting writes (timeout)

 Ex:

 (env)➜  curl -v -H 'content-type: text/plain' -XPUT '
 http://localhost:10018/buckets/players/keys/name' -d Ryan Zezeski
 * Adding handle: conn: 0x7f995a804000
 * Adding handle: send: 0
 * Adding handle: recv: 0
 * Curl_addHandleToPipeline: length: 1
 * - Conn 0 (0x7f995a804000) send_pipe: 1, recv_pipe: 0
 * About to connect() to localhost port 10018 (#0)
 *   Trying 127.0.0.1...
 * Connected to localhost (127.0.0.1) port 10018 (#0)
  PUT /buckets/players/keys/name HTTP/1.1
  User-Agent: curl/7.30.0
  Host: localhost:10018
  Accept: */*
  content-type: text/plain
  Content-Length: 12
 
 * upload completely sent off: 12 out of 12 bytes
  HTTP/1.1 503 Service Unavailable
  Vary: Accept-Encoding
 * Server MochiWeb/1.1 WebMachine/1.9.2 (someone had painted it blue) is
 not blacklisted
  Server: MochiWeb/1.1 WebMachine/1.9.2 (someone had painted it blue)
  Date: Mon, 15 Jul 2013 19:54:50 GMT
  Content-Type: text/plain
  Content-Length: 18
 
 request timed out
 * Connection #0 to host localhost left intact

 Here are my Vagrant file:

 https://gist.github.com/themartorana/460a52bb3f840010ecde

 and build script for the server:

 https://gist.github.com/themartorana/e2e0126c01b8ef01cc53

 Hope this helps.

 Dave

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search and Sorting

2013-07-18 Thread Ryan Zezeski
Jeremiah,

Sorting is broken in protobuffs currently.  Unfortunately the fix got lost
in the cracks.

https://github.com/basho/riak_search/pull/136

-Z


On Thu, Jul 18, 2013 at 10:11 AM, Jeremiah Peschka 
jeremiah.pesc...@gmail.com wrote:

 I just confirmed that today I'm getting the correct sorting in the browser
 but not in CorrugatedIron. I'm about to start in on a day of working with a
 client. Will verify this afternoon.

 ---
 Jeremiah Peschka - Founder, Brent Ozar Unlimited
 MCITP: SQL Server 2008, MVP
 Cloudera Certified Developer for Apache Hadoop


 On Thu, Jul 18, 2013 at 6:55 AM, Ryan Zezeski rzeze...@basho.com wrote:

 Jeremiah,

 After a quick glance I don't see anything obvious in the code.  I notice
 you have a presort defined.  By any chance, if you remove the presort, do
 you get a correct sorting on the creation_dt field?

 -Z


 On Wed, Jul 17, 2013 at 5:30 PM, Jeremiah Peschka 
 jeremiah.pesc...@gmail.com wrote:

 I'm attempting to sort data with Riak Search and have run into a
 distinct lack of sorting.

 When using curl (The Fullest Featurest Riak Client EVAR™), I query the
 following URL:
 http://localhost:10038/solr/posts/select?q=title_txt:googlepresort=keysort=creation_dtrows=500

 Being aware that results are sorted AFTER filtering on the server side,
 I adjusted my query to accept too many rows: there are 335 rows that meet
 my query criteria. However, Riak Search returns 10 sorted by some random
 criteria that I'm not aware of (it's not score, that's for sure).

 Is this behavior expected? Is there something that I've missed in my
 query?

 ---
 Jeremiah Peschka - Founder, Brent Ozar Unlimited
 MCITP: SQL Server 2008, MVP
 Cloudera Certified Developer for Apache Hadoop

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Data population of Yokozuna on key-path in schema?

2013-07-18 Thread Ryan Zezeski
Yes, it has similar rules.  Nested objects have their fields joined by '_'.
 Arrays become repetitive field names, which should map to a multi-valued
field.  You can use the URL I provided in the last response to see exactly
how field-values are extracted.


On Thu, Jul 18, 2013 at 12:16 PM, Dave Martorana d...@flyclops.com wrote:

 Does the JSON extractor work in a similar fashion, or does it follow its
 own rules? We don’t use XML anywhere (but JSON everywhere). Thanks!

 Dave


 On Thu, Jul 18, 2013 at 9:31 AM, Ryan Zezeski rzeze...@basho.com wrote:

 As Eric said, the XML extractor causes the nested elements to become
 concatenated by an underscore.  Extractor is a Yokozuna term.  It is the
 process by which a Riak Object is mapped to a Solr document.  In the case
 of a Riak Object whose value is XML the XML is flattened by a)
 concatenating nested elements with '_' and b) concatenating attributes with
 '@' (this can be changed if necessary, just ask).  Yokozuna provides a
 resource to test how a given object would be extracted.

 curl -X PUT -i -h 'content-type: application/xml' 'http://host:port/extract'
 --data-binary @some.xml

 This will return a JSON representation of the field-values extracted from
 the object.  You can use a json pretty printer like jsonpp to make it
 easier to read.

 -Z




 On Wed, Jul 17, 2013 at 8:51 PM, Eric Redmond eredm...@basho.com wrote:

 That's correct. The XML extractor nests by element name, separating
 elements by an underscore.

 Eric

 On Jul 17, 2013, at 12:46 PM, Dave Martorana d...@flyclops.com wrote:

 Hi,

 I realize I may be way off-base, but I noticed the following slide in
 Ryan’s recent Ricon East talk on Yokozuna:

 http://cl.ly/image/3s1b1v2w2x12

 Does the schema pick out values based on key-path automatically? For
 instance,

 commitrepoval/repo.../commit

 automatically gets mapped to the “commit_repo field definition for the
 schema?

 Thanks!

 Dave
 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Yokozuna 0.7.0

2013-07-01 Thread Ryan Zezeski
Riak Users,

Today I'm excited to bring you the 0.7.0 release of Yokozuna.  It includes
some new features such as an upgrade to Solr 4.3.0, isolation of index
failures, one-to-many index-to-buckets relationship, and map-reduce
support.  There is also a performance improvement in index throughput.
 Along with several bug fixes.  See the release notes for more detail.

https://github.com/basho/yokozuna/blob/master/docs/RELEASE_NOTES.md#070

Once again I have forgone the EC2 AMI.  Only a source package is available.
 You can find instructions for installing on the INSTALL page.

https://github.com/basho/yokozuna/blob/master/docs/INSTALL.md

For those that have been using Yokozuna.  The one-to-many change is a
breaking change.  Creating an index no longer implicitly indexes the bucket
with the same name.  Two steps must be performed.  FIrst you create the
index as before.  Second you add a bucket property 'yz_index' whose value
is the name of the index you wish to index that bucket under.  There is an
example in the README.

https://github.com/basho/yokozuna#creating-an-index

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: [ANN] Yokozuna 0.7.0

2013-07-01 Thread Ryan Zezeski
Yokozuna supports protobuff already.  It uses the same protocol as current
Riak Search so it is currently limited to that feature set.  It should
just work.  However, currently, if both Riak Search and Yokozuna are
enabled then Riak Search will handle all queries.

Are you asking in regards to CorrugatedIron by chance?

-Z


On Mon, Jul 1, 2013 at 12:29 PM, Jeremiah Peschka 
jeremiah.pesc...@gmail.com wrote:

 What level of PBC integration can we expect from Yokozuna? Is that
 developed but not documented or is that a TBA feature?

 ---
 Jeremiah Peschka - Founder, Brent Ozar Unlimited
 MCITP: SQL Server 2008, MVP
 Cloudera Certified Developer for Apache Hadoop


 On Mon, Jul 1, 2013 at 8:46 AM, Ryan Zezeski rzeze...@basho.com wrote:

 Riak Users,

 Today I'm excited to bring you the 0.7.0 release of Yokozuna.  It
 includes some new features such as an upgrade to Solr 4.3.0, isolation of
 index failures, one-to-many index-to-buckets relationship, and map-reduce
 support.  There is also a performance improvement in index throughput.
  Along with several bug fixes.  See the release notes for more detail.

 https://github.com/basho/yokozuna/blob/master/docs/RELEASE_NOTES.md#070

 Once again I have forgone the EC2 AMI.  Only a source package is
 available.  You can find instructions for installing on the INSTALL page.

 https://github.com/basho/yokozuna/blob/master/docs/INSTALL.md

 For those that have been using Yokozuna.  The one-to-many change is a
 breaking change.  Creating an index no longer implicitly indexes the bucket
 with the same name.  Two steps must be performed.  FIrst you create the
 index as before.  Second you add a bucket property 'yz_index' whose value
 is the name of the index you wish to index that bucket under.  There is an
 example in the README.

 https://github.com/basho/yokozuna#creating-an-index

 -Z

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Question: Riak search and wildcard

2013-06-07 Thread Ryan Zezeski
Hi Otto


 I will propably make a config file in my app or some temporary
 variable which will contain 10 keys which I get with a map/reduce than
 is run daily, and then I'll fetch the predefined set from Riak when I
 need the 10 first results. Although this will require 10 requests to
 get 10 results, search would have been ideal since one request can
 return a big set of results..

 It is a pitty the search feature does not support q=* as query.


There are reasons for why this is not implemented in Riak Search.  For one,
it would be massively expensive as it requires iterating through all
inverted indexes on a covering set of partitions and building up the entire
list of matching keys (i.e. all keys) in memory on the coordinating node.

The solution which will replace Riak Search, Yokozuna [1], can perform this
operation just fine.  But you will need to store the fields if you wish to
get their values back in the query result, otherwise you still need 11
operations, 1 for the query, 10 to get the values (or use map/reduce as a
multiget).

However, since there is nothing to score on, your 10 results are
effectively random (or perhaps it would fall back to index order).  So I'm
not sure I follow you when you say 10 first results.  What is frist in
relation to?

-Z

[1]: https://github.com/basho/yokozuna
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Yokozuna 0.6.0

2013-05-19 Thread Ryan Zezeski
Riak Users,

Today I'm pleased to announce the 0.6.0 release of Yokozuna.  Two
highlights of this release are:

1. Initial protobuff support at parity with Riak Search.  This means that
existing Riak clients which have Riak Search/PB support should now be able
to query Yokozuna.  Please note that, currently, having both Riak Search
and Yokozuna enabled will cause issues.  That will be addressed soon in
order to allow migrations in the future.

2. A 30-40% performance in query throughput thanks to caching of coverage
plans.  The improvement will vary on workload.  It will be
most noticeable for slow CPU or query results coming from Solr cache.  This
is because the patch removes a lot of CPU work during the query from the
Yokozuna side.

There are a slew of other changes.  See the release notes for more detail
[1].

I decided to forgo the EC2 version for this release.  The base AMI is
starting to get long in the tooth and I'm not sure if anyone is actually
making use of the Yokozuna AMI.  If you need it please ping me via email
and I'll be sure to build an 0.6.0 AMI.

I've also added updated instructions for installing Riak-Yokozuna.  The
preferred method now is to use the source package.  See the INSTALL doc for
more details [2].


[1]: https://github.com/basho/yokozuna/blob/master/docs/RELEASE_NOTES.md#060

[2]: https://github.com/basho/yokozuna/blob/master/docs/INSTALL.md
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Deleting from search index but not from bucket

2013-04-23 Thread Ryan Zezeski
 Is it possible to delete an object from the search index without deleting
 it from the bucket?
 I'm using Erlang pb client.


It is possible but not a first-class operation and certainly not supported
via the PB client.  It would require custom erlang code.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna max practical bucket limit

2013-04-22 Thread Ryan Zezeski
Elias,

I could see that being something some folks want.  From my point of view, I
 find that the existing design of one core per bucket may be more useful, so
 long as I can search across cores with similar schemas (I created an 
 issuehttps://github.com/basho/yokozuna/issues/87to track that feature), as 
 it allows me to easily drop the index for a
 bucket.  In a multi-tenant environment, where you may have an index per
 customer, this is rather useful.  A lot less painful than trying to delete
 the index (and data) by performing a key listing and delete operations.


Well you still can't avoid the key-listing/delete for Riak itself.  For
Solr this would be a delete-by-query which isn't nearly as expensive.



 As I've expressed before, I wish buckets behaved the same way, segregating
 their data into distinct backend, but I understand that this results in
 lower resource usage, as things like LevelDB caches would then not be
 shared and you'd need additional file descriptors.  At the very least, it
 would be great if backend instances could be
 created programmatically through the HTTP or PB API, rather than having to
 modify app.config and perform a rolling restart.  That not very
 operationally friendly.


Yes, there are benefits to be had both ways.  Segregating the actual
backend instances allows for efficient drop of entire bucket, but adds
strain in terms of file descriptors and I/O contention.  Multi-backend
sorta helps but is static in nature as you mention.



 As for large number of cores, I could see some folks creating many of
 them.  Buckets are relatively cheap, since by default they are all stored
 in the default backend instance.  Their only cost being
 the additional network traffic for gossiping non-default bucket properties.
  So folks create them freely. Once Yokozuna is better documented, it should
 be pointed out that the same is not true of a bucket's index, since they
 create one core per bucket.  So an indexed bucket has quite a bit more
 static overhead than non-indexed one.


Good point.



 If you use Riak and have 300 customers, you can easily create a bucket per
 customer, even if you only have 64 partions and are using Riak Search on
 all of them, as Search stores all the data in the same merge index backend.
  You may want to twice before upgrading such cluster to Yokozuna.


Well, Riak Search will have issues as well.  First, each bucket will
require a pre-commit hook to be installed which means custom bucket
properties to be copied into the ring.  There is a known drawback with Riak
where many bucket properties greatly reduce ring gossip throughput and can
cause issues.  I believe Joseph Blomstedt may have some patches going into
the next release that will improve this but ultimately we need to get
bucket properties out of the ring.  Even if that is solved, Riak Search
will have other tradeoffs such as substantially reduced feature support
compared to Yokozuna as well as reduced performance for many types of
queries.  But I do agree many indexes (thus cores) could pose a problem for
Yokozuna.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: expected_binaries error in search

2013-04-22 Thread Ryan Zezeski
Rob,

I reproduced this at the command line. Here I'm storing two documents, with
 IDs 'doc8' and 'doc9', into a search-enabled bucket named 'test-search'.

 # This command works, even though 'lsh' is empty. I believe this is
 because I've never put a field named 'lsh' in this bucket, 'test-search'.
 curl -v -XPUT -H Content-Type: application/json -d '{terms: empty|en
 string|en test|en, lsh: , segments: d do doc docs}' '
 http://riak.lumi:8098/riak/test-search/doc8?returnbody=true'

 # However, if I use the empty string for a field that has ever been
 indexed before, *then* I get a crash.
 curl -v -XPUT -H Content-Type: application/json -d '{terms: ,
 segments: d do doc docs}' '
 http://riak.lumi:8098/riak/test-search/doc9?returnbody=true'


I copy/pasted your commands and could not reproduce.  Both docs indexed
correctly and were returned when running the following search.  However,
the reason this works is because it's going through KV and using the commit
hook to index.  The error you originally pasted is from data being indexed
via the Riak Search Solr endpoint.  They are two different things.

As it turns out there is a bug(?) in the Solr end-point.  It doesn't like
empty fields.  E.g. if I send the following XML I can reproduce the error.

---
?xml version=1.0 encoding=UTF-8?
add
  doc
iddocA/id
field name=termsempty|en string|en test|en/field
field name=lsh/field
field name=segmentsd do doc docs/field
  /doc
/add
---

To confirm, here's what I see in the log:

2013-04-22 17:02:56.394 [error]
0.4452.0@riak_solr_indexer_wm:malformed_request:37 Unable to parse
request: {expected_binaries,lsh,[]}

I filed an issue: https://github.com/basho/riak_search/issues/141




 As requested, I typed that redbug command into the Erlang console, and at
 the moment of the crash I get some output. The output baffles me, though.
 The included ID is in the style of ID we use for actual documents, which
 are stored in totally different buckets, and which I didn't refer to at all
 in my testing.


My guess is you tested this on the system while other writes were coming
in.  This is what redbug is picking up.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Active Anti Entropy with Bitcask Key Expiry

2013-04-18 Thread Ryan Zezeski
Ben,

AAE should not resurrect keys when bitcask expiry is enabled.  However, a
non-trivial amount of work may be performed if a lot of keys expire all at
once.

You're correct that the layers above bitcask have no notion of expiry.
 When a key expires no notification is sent to Riak.  This means that
hashtrees (which I'll call trees from here on out) will continue storing an
entry for a key after it has expired.  As long as all trees agree that the
key is still there AAE will be none the wiser about expiry.  However, AAE
has its own notion of expiry.  Every tree has en expiration date at which
point is is discarded and rebuilt from scratch based on the data in the
backend.  By default trees expire after a week.  This means there could be
a window where the trees disagree because some were rebuilt and no longer
include the expired key.  At this point AAE will try to repair the data by
invoking a read-repair.  Since bitcask honors expiry on 'get' all N copies
will return not_found and thus read-repair will do nothing.  Then AAE will
send a 'rehash' request to all N replicas [1] [2].  The rehash will notice
the key is no longer and delete it from the tree.

So, keys should not be resurrected, but it could generate additional I/O
proportional to the number of keys expired.  For example:

1. bitcask expiry is set to 1 day
2. millions of keys are written in hour time span thus every hour millions
of keys expire
3. the same key is never overwritten inside a weeks time
4. AAE is using default tree expiry of a week
5. the trees for a given preflist are _not_ all expired at about the same
time

In this scenario, when a tree expires it may have millions of expired keys
to deal with.  This means millions of Riak 'get' calls plus millions of
'rehash' calls.  Now, since the rehash operation is sent to all replicas
only 1 tree of a preflist needs to expire for all replica trees to be
repaired.  This means the maximum number of times you should take this hit
is Q / N where Q = ring size, N = n_val.

Point #3, #4, #5 really are the key here.  There must be an overlap where
keys are expired and only a subset of a preflist's trees have been rebuilt.
 The more often keys are re-written and the more nodes you have the less
likely it will be to hit this window.

-Z

[1]
https://github.com/basho/riak_kv/blob/master/src/riak_kv_exchange_fsm.erl#L232

[2] https://github.com/basho/riak_kv/blob/master/src/riak_kv_vnode.erl#L482


On Tue, Apr 16, 2013 at 11:07 AM, Ben Murphy benmmur...@gmail.com wrote:

 Does anyone know if these two place nice with each other? As far as I can
 see the higher layers sitting on top of bitcask are not aware that bitcask
 can expire keys. Would the anti-entropy code try to resurrect expired keys?

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: expected_binaries error in search

2013-04-09 Thread Ryan Zezeski
Rob,

That cryptic error is saying that it expected a binary value type for the
field 'ish' (binary is a data structure in Erlang) but instead got an
empty list.  Do you by any chance have the exact data which is causing the
issue?  If you can isolate the data causing the problems then attach to the
riak console and run the following.

redbug:start(riak_solr_xml_xform:xform/1 - return).

Then in another window try to index the data.  Copy the riak console output
and mail it to me.  My guess is something is getting parsed incorrectly.

-Z


On Tue, Apr 9, 2013 at 12:49 PM, Rob Speer r...@luminoso.com wrote:

 We're having problems where Riak nodes stop responding to requests,
 sometimes while trying to add documents and sometimes while trying to
 delete them.

 There are a lot of errors in the logs on all machines, and we're wondering
 if this has something to do with it. A message like this appears every 1-12
 minutes:

 2013-04-09 11:47:52.955 [error]
 0.29725.18@riak_solr_indexer_wm:malformed_request:37 Unable to parse
 request: {expected_binaries,lsh,[]}

 lsh is a field on the data structures we're indexing (it contains
 arbitrary tokens generated for locality-sensitive hashing). Here's an
 example of what we might be telling Riak Search to index. (It's intentional
 that we're using the whitespace analyzer on all fields.)

 {
'id': 'uuid-1b34a5a7d5894e1f92874066d074ecec',
'subsets': '__all__ subset1',
'terms': 'example|en text|en',
'lsh': 'ANRW BMkA CHyu DN60',
'segments': '1 1b 1b3 1b34'
 }

 This would get sent through self.riak.solr.add() in the Riak Python
 client, of which we're using the latest version committed to master
 (1a379dc1), via the Protocol Buffers transport.

 It is possible to store a document that is missing 'terms' or 'lsh'; is
 Riak complaining about their absence when it throws an expected_binaries
 error? Would this be causing Riak to stop responding to its client
 connections?


 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: AAE and changing the replication factor

2013-04-08 Thread Ryan Zezeski
Elias,

Setting the n_val higher should add the missing replicas.  However, setting
lower will currently leave the extra replicas alone.  This was chosen to
err on the side of caution for now.  This would be easy enough to verify
with a riak_test [1] but I don't think we have one at the moment.

-Z

[1]: https://github.com/basho/riak_test



On Mon, Apr 8, 2013 at 8:08 PM, Elias Levy fearsome.lucid...@gmail.comwrote:

 I am wondering if AAE means that we can now change the replication factor
 of a Riak bucket and have the additional missing replicas be created by
 AAE, rather than having to reinsert all the data in the bucket.

 Elias Levy

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Post commit hooks are a single process, so they are executed in the same order as the commits ?

2013-04-08 Thread Ryan Zezeski
Simon,


On Mon, Apr 8, 2013 at 7:14 PM, Simon Majou si...@majou.org wrote:

 Hello,

 I want to sync a bucket of a first cluster with the bucket of a second
 cluster. To do that I think using the post commit hook.


If you didn't know, this is exactly what Riak Enterprise was built to do.
 I.e. handle multi-cluster replication.  However, if you want to give it a
go on your own a post-commit hook is one way to get the job done.  You'll
want to think through failure scenarios where the receiving cluster is down
and how to deal with msgs that are dropped between clusters.  The
post-commit hook runs on a process called the coordinator, there is a
coordinator for every incoming request.  So you won't block the vnodes,
which is important, but the client/user request will block until your
post-commit returns.



 Is there any risk that the sequence of PUTs to be mixed in such a scenario
 ?


Do you mean the sequence seen on cluster A vs. cluster B?  Are you asking
if the object could appear to be on B before A even though the PUT was sent
to A?  The answer is, it depends.  With a healthy system it's probably
unlikely but it will depend on your DW values and state of each cluster.
 E.g. if cluster A nodes get slow disk I/O then perhaps the replication to
cluster B could beat writes on A.  If we start introducing node and network
failures, or changing W/DW values then things can get more complicated.
 You could have success on cluster A, fire replica to cluster B, all
primary nodes for that object on cluster A die, now cluster B will have a
key for which cluster A says not_found (well, not totally true, depends on
your PR value).

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Yokozuna max practical bucket limit

2013-04-08 Thread Ryan Zezeski
Elias,

This is exactly why I chose not to make a core per partition.  My gut
feeling was that most users are likely to have more partitions than indexed
buckets.  I don't know the overhead per-core or what the limits might be.
 I would recommend the Solr mailing list for questions like that.  I've
also looked at that LotsOfCores page before.  One benefit to using Solr
is that any improvements made to it should also trickle down to Yokozuna.

That said, I still plan to allow a one-to-many mapping from index to
buckets.  That would allow many KV buckets to index under the same core.  I
have an idea of how to implement it.  I'm fairly certain it would work just
fine.  I just need to add a GitHub issue and then it's a simple matter of
coding.

-Z


On Mon, Apr 8, 2013 at 6:25 PM, Elias Levy fearsome.lucid...@gmail.comwrote:

 Thinking about Yokozuna it would appear that for some set of hardware
 specs there must be some maximum practical number of indexed buckets.
  Yokozuna creates one Solr core per bucket per node.  Scaling the Riak
 cluster will reduce the amount of data indexed per core, but not the number
 of cores node.  I assume there is some static overhead per Solr core, and
 thus a maximum number of indexed buckets per cluster based on the per node
 resources.

 Any idea what this may be be, roughly?  Has anyone tried to max out the
 number of indexed buckets?

 Searching the Solr mailing list it seems some folks have up to 800 cores
 per slave, but their hardware is unknown and queries are being served by
 slaves, so the cores are only indexing.

 It looks like there is some ongoing work in Solr to support large number
 of cores by dynamically loading and unloading them (
 http://wiki.apache.org/solr/LotsOfCores).  Is this something Yokozuna may
 make use of?  It may be to expensive a hit for latencies.

 Elias Levy


 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Yokozuna 0.5.0

2013-04-05 Thread Ryan Zezeski
Riak Users,

Today I'm pleased to announce the 0.5.0 release of Yokozuna.  This release
includes a bit of everything.  New features, bug fixes, an upgrade to Solr
4.2.0, and search performance improvement.  See the full release notes for
more detail.

Thank you to @timdoug and @kyleslattery for their contributions.

Release notes:

https://github.com/basho/yokozuna/blob/master/docs/RELEASE_NOTES.md#050

EC2 Deployment:

https://github.com/basho/yokozuna/blob/5a62fde0a9d79f9ae392922567aadadd47094b53/docs/EC2.md

Source Package:

https://s3.amazonaws.com/yzami/pkgs/src/riak-yokozuna-0.5.0-src.tar.gz

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Set the default search schema analyzer_factory to 'standard_analyzer_factory' for all future buckets

2013-04-02 Thread Ryan Zezeski
On Mon, Mar 4, 2013 at 9:36 AM, vvsanil vvsanilku...@gmail.com wrote:

 Is there anyway to set the default search schema analyzer_factory to
 'standard_analyzer_factory' for all future buckets (i.e. without having to
 manually set schema each time a new bucket is created) ?


Yes, look for the file `default.def` under your lib dir where
`riak_search/priv` lives.  That file is used as the default schema when one
is not explicitly created.  N.B. you must update this file on _EVERY_ node
or you may get unexpected results.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search cannot detect element with ''

2013-04-02 Thread Ryan Zezeski
Tony,

Riak Search is treating the '' the same as 'AND'.  If you encode it as
%5C%26 it should work.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: How should I avoid words that are effectively stopwords in Riak Search?

2013-03-29 Thread Ryan Zezeski
Rob,

Riak Search doesn't have a traditional term-frequency count.  It has
something similar but it's an estimate and it is much more expensive than a
simple table lookup.  Even if it did have term-frequency it doesn't really
expose it to the outside world.  Not only that, but the standard analyzer
provides no way to specify additional stop words.  You'd have to keep track
of this data externally and do some pre-processing to remove stopwords
before.

For the last 9 months I've been working on a project called Yokozuna with
the goal to replace Riak Search [1].  It's like Riak Search except much
better because the underlying engine is actually Solr/Lucene, not an
inferior clone written in Erlang.  In that case you could add new
stopwords, exploit query caching, and use newer features like LUCENE-4628
[2] to help combat high frequency terms.  You'd also have an easy way to
get frequency count for a given term to determine if you should make it a
stopword.

[1] https://github.com/basho/yokozuna

[2] https://issues.apache.org/jira/browse/LUCENE-4628


On Fri, Mar 22, 2013 at 2:21 PM, Rob Speer r...@luminoso.com wrote:

 My company is starting to use Riak for document storage. I'm pretty happy
 about how it has been working so far, but I see the messages of foreboding
 and doom out there about Riak Search and I've encountered a problem myself.

 I can't really avoid using Riak Search, as full text indexing is a key
 feature we need to provide. If Riak Search is suboptimal, so is basically
 every other text index out there. We've just been burned by ElasticSearch's
 ineffective load balancing (who would have guessed, consistent hashing is
 kind of important).

 I know that performing searches in Riak Search that return many thousands
 of documents is discouraged for performance reasons, and the developers
 encourage removing stopwords to help with this. There's additionally, I
 have seen, a hard limit on the number of documents that can be examined by
 a search query; if any term matches more than 100,000 documents, the query
 will return a too_many_results error (and, incidentally, things will get so
 confused that, in the Python client, the *next* query will also fail with
 an HTTP error 400).

 The question is, what should I actually do to avoid this case? I've
 already removed the usual stopwords, but any particular set of documents
 might have its own personal stopwords. For example, in a database of
 millions of hotel reviews, the word 'hotel' could easily appear in more
 than 100,000 documents.

 If we need to search for '5-star hotel', it's wasteful and probably
 crash-prone to retrieve all the 'hotel' results. What I'd really like to do
 is just search for '5-star', which because of IDF scoring will have about
 the same effect. That requires knowing somehow that the word 'hotel'
 appears in too many documents.


 Is there a way to determine, via Riak, which terms are overused so I can
 remove them from search queries? Or do I need to keep track of this
 entirely on the client end so I can avoid searching for those terms?

 Thanks,
 -- Rob Speer

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Yokozuna 0.4.0

2013-03-06 Thread Ryan Zezeski
Hello Riak Users,

Today I'm pleased to announce the 0.4.0 release of Yokozuna.  This is a
small release in terms of features added, as there are no new features, but
an important release for reasons enumerated below.

* Performance improvements to Solr's distributed search thus improving
performance of Yokozuna queries [1] [2] [3].

* This release is based off Riak 1.3.0.  This release is essentially Riak
1.3.0 with the Yokozuna bits added to it.

* Yokozuna has moved from my personal GitHub account into the Basho
organization.  The prototype status is still in effect but this is a very
important step towards the goal of merging Yokozuna into Riak proper.

release notes:

https://github.com/basho/yokozuna/blob/v0.4.0/docs/RELEASE_NOTES.md

instructions to deploy on ec2:

https://github.com/basho/yokozuna/blob/c3a1cad34f65f1f5f1d416f3f25b2ab5254a583a/docs/EC2.md

source package:

http://s3.amazonaws.com/yzami/pkgs/src/riak-yokozuna-0.4.0-src.tar.gz

-Z

[1] Yokozuna pull-request: https://github.com/basho/yokozuna/pull/26

[2] Upstream patch to Solr: https://issues.apache.org/jira/browse/SOLR-4509

[3] I discuss this change in depth:
http://www.zinascii.com/2013/solr-distributed-search-and-the-stale-check.html
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: About the node show down when occur riak_searc error of too_many_resut

2013-02-08 Thread Ryan Zezeski
Jason,

Riak Search has a limit of 100k results at which point it halts processing
by throwing an exception.  It does this to protect itself from having to
build an indeterminately sized list and then sort it.  You can raise this
limit but you might start seeing large process heaps using lots of CPU for
GC or sorting.

I'm having a bit of trouble understanding your second point.  Are you
saying that the node goes down after this error?  The only reason I see
that happening is if you run this query (or others that also match large
results) many times in succession causing max restart events (I'm referring
to Erlang/OTP supervisor/worker restarts) to occur eventually reaching up
to the root supervisor and thus exiting.

-Z


On Tue, Jan 22, 2013 at 5:16 AM, 郎咸武 langxian...@gmail.com wrote:

 Hi all

 The default value of 100,000 can be custom tuned with the
 max_search_results setting in the etc/app.config file.

 I using the default value. There are 1,000,000 K/V. I only
 invoke riakc_pb_socket:search(Pid, Bucket, name:u1*)[1] when  the node
 shutdown[2].
 The error is an obvious out of the default value.
 But the node is showdown. This is really out of my expectation.

 Is this a bug?
 Who can give me  some advice?

 Cheers ,Jason


 The enviroment:
 Erlang R14B04 (erts-5.8.5)
 FreeBSD meda082 9.0-RELEASE FreeBSD 9.0-RELEASE #0: Tue Jan  3 07:46:30
 UTC 2012
 riak1.2
 riakclint 1.3

 [1]{error,Error processing incoming message: throw:{too_many_results,\n
  {scope...}

 [2]
 (riak@127.0.0.1)1 17:03:55.626 [error] gen_server 0.2736.0 terminated
 with reason:
 {throw,{too_many_results,{scope,#Ref0.0.0.8359,test_riak_json

 ,value,{scope,#Ref0.0.0.8358,undefined,name,{range_sized,#Ref0.0.0.8362,{inclusive,u1},{inclusive,u1\377},all,undefined,[{ria

 k_search_client,'-search/7-fun-0-',4},{riak_search_client,fold_results,5},{riak_search_client,search,8},{riak_search_client,search_doc,8},{riak_search

 _utils,run_query,7},{riak_search_pb_query,run_query,7},{riak_search_pb_query,process,2},{riak_api_pb_server,process_message,4}]}^M^M
 17:03:55.643 [error] CRASH REPORT Process 0.2736.0 with 1 neighbours
 exited with reason:
 {throw,{too_many_results,{scope,#Ref0.0.0.8359,test_riak

 _json,value,{scope,#Ref0.0.0.8358,undefined,name,{range_sized,#Ref0.0.0.8362,{inclusive,u1},{inclusive,u1\377},all,undefined,

 [{riak_search_client,'-search/7-fun-0-',4},{riak_search_client,fold_results,5},{riak_search_client,search,8},{riak_search_client,search_doc,8},{riak_s
 earch_utils,run_query,7},{riak_search_pb_query,run_query,7},{riak_search_pb_query,process,2},{riak_api_pb_server,process_message,4}]}
 in gen_server:te
 rminate/6^M^M



 --
 只为成功找方法,不为失败找理由
 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


[ANN] Yokozuna 0.3.0

2013-02-04 Thread Ryan Zezeski
Riak Users,

Today I'm happy to announce the 3rd pre-release of Yokozuna.  It's light on
new features but has some good performance improvements and added
robustness.  Here are the highlights:

* Allow store/retrieval of schemas via HTTP.

* Upgrade to Solr 4.1.0 and the latest Riak.

* Improve write/index throughput by disabling Solr's realtime get and
switching from XML update to JSON.

* Added robustness around AAE and default index creation.

* Listen on 'solr/index/select' to more easily work with existing clients
out of the box.

To see all changes read the full release notes [1].  Like the last two
releases, an AMI has been made, see the EC2 doc for more info [2].

New for this release is the addition of a source package.  I hope this
might encourage those who are scared off by the process of building from
git to give Riak/Yokozuna a try.  These four steps below will produce a
ready-to-run node under 'rel/riak' [3].

wget http://s3.amazonaws.com/yzami/pkgs/src/riak-yokozuna-0.3.0-src.tar.gz
tar zxvf riak-yokozuna-0.3.0-src.tar.gz
cd riak-yokozuna-0.3.0-src
make stage

[1]: https://github.com/rzezeski/yokozuna/blob/v0.3.0/docs/RELEASE_NOTES.md

[2]: https://github.com/rzezeski/yokozuna/blob/v0.3.0/docs/EC2.md

[3]: You may want to change some configuration first:
http://docs.basho.com/riak/1.2.1/cookbooks/Basic-Cluster-Setup/

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search on KV data in erlang terms

2013-01-22 Thread Ryan Zezeski
Takeshi,

The erlang extractors don't work when writing the data via HTTP as they
store the data as a binary.  I.e. [{name, bob}] becomes [{\name\,
\bob\}] instead of the proplist you expect.  This is because the
extractor assumes the object data is already properly decoded.  It is
possible to write erlang terms via the erlang client, but I wouldn't
recommend it as it is not the common case and I feel like there are other
issues lurking in Riak if you do this.  If there a particular reason you
are trying to store erlang terms?  Are you worried about space?  I would
just stick with JSON or XML if that is acceptable.

-Z


On Tue, Jan 22, 2013 at 10:15 AM, Takeshi Matsumura
takeshi4...@gmail.comwrote:

 Hi,

 I tried to store erlang data and query them by using the Riak Search
 without success so far, and thus would like to ask if I'm doing the right
 thing. Riak Search was enabled in the app.config file and the server was
 restarted. The pre-commit hook was installed from the command line.

 bin/search-cmd install mybucket

 The erlang data that I uploaded is a proplists with a single pair of key
 and value.

 [{name, bob}]

 It was uploaded by using the curl command with Content-Type
 application/x-erlang. (hoge.erl.txt contains the above erlang terms).

 curl -v -d @hoge.erl.txt -X PUT -H content-type: application/x-erlang 
 http://localhost:8098/riak/mybucket/bob;

 I could get the document by issuing curl command to
 /riak/mybucket/bob. The HTTP response header contained the correct
 Content-Type, application/x-erlang.

 Then I run the Riak Search from the command line.

 bin/search-cmd search mybucket name:bob

 Unfortunately, the result said Found 0 results.

 As I wondered if this is a problem related to the erlang terms, I tried
 the same with a JSON data that is found in the Indexing and Querying KV
 Data page but setting application/json to the Content-type.

 {
  name:Alyssa P. Hacker,
  bio:I'm an engineer, making awesome things.,
  favorites:{
   book:The Moon is a Harsh Mistress,
   album:Magical Mystery Tour
  }
 }


 Then the Riak Search could find this document with the query
 name:Alyssa*.

 According to the Indexing and Querying KV Data page of the Riak
 document, erlang terms can be queried by using the Riak Search. However, it
 is unclear to me if it is enabled by default because the page says XML,
 JSON, and plain-text encodings are supported out of the box but it doesn't
 mention the erlang terms.

 I followed the Other Data Encodings section and set the
 riak_search_kv_erlang_extractor module, but it didm't change the
 situation.

 curl -XPUT -H 'content-type: application/json' \
 http://localhost:8098/riak/mybucket \
 -d 
 '{props:{search_extractor:{mod:riak_search_kv_erlang_extractor, 
 fun:extract, arg:my_arg}}}'


 I changed the data as follows and uploaded but it didn't help either.

 [{name, bob}]

 I appreciate any hints for me to go forward. Thank you in advance.

 Best regards,
  Takeshi



 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Yokozuna 0.2.0 [ANN]

2013-01-02 Thread Ryan Zezeski
Hello Riak Users,

For those of you who missed the announcement on Monday, Yokozuna 0.2.0 has
been released.  This release includes two big features that I discussed in
my RICON talk [1] but hadn't yet been completed.

1) Active Anti-Entropy (AAE): This is a process that constantly verifies
that the data and it's corresponding indexes are in-sync.  This is done in
the background, in an efficient manner, that should require no intervention
on the user's part.

2) Sibling Support: If allow_mult is set to true, meaning you want Riak to
store siblings when there is a conflict, then Yokozuna will index all
siblings.  If a search matches any sibling of an object then it will be
included as a result.  When an object's siblings are reconciled to one
version then all sibling indexes will be deleted.

To see a full list of changes checkout the Basho blog post linked below or
go directly to the 0.2.0 release notes if you prefer [2].

http://basho.com/blog/technical/2012/12/31/yokozuna-pre-release-0.2.0-now-available/

-Z

[1]: http://vimeo.com/54266574

[2]:
https://github.com/rzezeski/yokozuna/blob/master/docs/RELEASE_NOTES.md#020
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak search query timeout issues with 1.2.1 stable

2012-12-20 Thread Ryan Zezeski
On Thu, Dec 20, 2012 at 9:51 AM, Abhinav Singh abhinavsi...@ymail.comwrote:



 error.log on riak@172.17.3.82 contains:
 2012-12-20 16:27:37.877 [error] 0.1821.0@mi_server:handle_info:524
 lookup/range failure:
 {{badfun,#Funriak_search_client.9.56347389},[{mi_server,iterate,6},{mi_server,lookup,8}]}
 2012-12-20 16:27:37.878 [error] emulator Error in process 0.4075.0 on
 node 'riak@172.17.3.82' with exit value:
 {{badfun,#Funriak_search_client.9.56347389},[{mi_server,iterate,6},{mi_server,lookup,8}]}

 2012-12-20 16:27:37.878 [error] 0.1940.0@mi_server:handle_info:524
 lookup/range failure:
 {{badfun,#Funriak_search_client.9.56347389},[{mi_server,iterate,6},{mi_server,lookup,8}]}
 2012-12-20 16:27:37.882 [error] emulator Error in process 0.4077.0 on
 node 'riak@172.17.3.82' with exit value:
 {{badfun,#Funriak_search_client.9.56347389},[{mi_server,iterate,6},{mi_server,lookup,8}]}


This is a very specific error and is an indication that the lambda created
in riak_search_client cannot be instantiated after being sent over the
network.  As I said, I have only ever seen this with mixed versions.



 We don't really have mixed riak release. But yes we do have a mixed erlang
 releases. Not sure if that makes any difference here.

 riak@172.17.3.82
 Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:2:2] [rq:2]
 [async-threads:0] [kernel-poll:false]

 riak@172.17.3.63
 Erlang R15B (erts-5.9) [source] [64-bit] [smp:4:4] [async-threads:0]
 [hipe] [kernel-poll:false]


I bet this is it.  In order for a lambda to be reconstructed after being
sent over the wire very specific conditions need to be met.  My guess is
either the erlang version is checked explicitly or is part of the module
hash, thus causing this failure.  Checkout this post by Kresten Krab Thorup.

http://www.javalimit.com/2010/05/passing-funs-to-other-erlang-nodes.html



 

 Unfortunately none of these error happens on our local dev environment.
 On my local dev box, I run a 5 node cluster (ofcourse all nodes on same
 physical machine).


Yes, because they are all using the same erlang version.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search - searching across all fields

2012-12-18 Thread Ryan Zezeski
On Sat, Dec 15, 2012 at 12:43 AM, Matt Painter m...@deity.co.nz wrote:

 Thanks so much Ryan - yokozuna sounds most promising. If I were building a
 small system (relatively simple, small user base) that will be
 production-ready in a few months, do you think that Yokozuna could cut the
 mustard? I see that it's officially an experimental prototype, but do you
 think it's stable 'enough' in its current state? Sorry if this is an
 impossible question to answer with too many variables...


My hope is to start delivering packages of Riak/Yokozuna by late February.
 They probably wouldn't be official Riak packages, but would allow for
easier installation for those that don't want to use AMI/source.  Compared
to Riak Search, Yokozuna will do better in almost all cases but a few.  In
a few cases Riak Search has an upper hand in latency/throughput but I have
tracked down the cause and will be making some patches to Solr's
distributed search soon.  Otherwise, Yokozuna is better in every way.
 Language support, analyzer support, features, performance, robustness,
etc.  Over the next couple of months I hope to start publishing benchmarks
and other information.

That said, this is still experimental, and I'm not sure I would recommend
using Yokozuna in production just yet.  But I would love to find users to
prototype with to see how well Yokozuna can handle various use cases.  If
this sounds interesting to you please send me a direct email.



 I must confess that using a forked Riak makes me a touch queasy for
 anything other than playpen stuff. Do you think that a combo deal of Riak +
 elasticsearch could be a suitable compromise for the time-being?


The fork of Riak used by Yokozuna is extremely minimal.  It mostly consist
of bundling the yokozuna library and sending the KV data so it can be
indexed.  The goal is and will continue to be to make _minimal_ changes
outside of Yokozuna.

You could certainly combine Riak and ES.  Other users of ours have done it.
 Honestly, go with whatever works for you.  No need to wait for Yokozuna if
you think you can get it done today with other tools.  I will note,
however, that something like that will not be as tightly integrated as
Yokozuna.  Which isn't to say it's bad, it's just a trade off to be aware
of.  E.g. Yokozuna has built-in active anti-entropy (AAE) with Riak--when
data becomes divergent AAE will detect it and fix it for you without
requiring action on your part.  You won't get that with Riak + external
solution (without writing your own code to do it).


-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak search query timeout issues with 1.2.1 stable

2012-12-14 Thread Ryan Zezeski
Hi, comments inline

On Wed, Dec 5, 2012 at 8:10 AM, Abhinav Singh abhinavsi...@ymail.comwrote:


 We are facing an issue where search queries works fine on my local dev box
 (which have riak-1.2.1rc2 installed).
 However same queries timeout on our production boxes (which have
 riak-1.2.1 installed):

 2012-12-05 14:49:59.777 [error] 0.1035.0@mi_server:handle_info:524
 lookup/range failure:
 {{badfun,#Funriak_search_client.9.56347389},[{mi_server,iterate,6},{mi_server,lookup,8}]}


Did you recently upgrade your production boxes?  The 'badfun' error is an
indication that you currently have a mixed cluster.  The error will occur
when two or more machines are involved and they are not all the same
version.  This is a bug in Riak Search.



 This query does succeed sometimes (1-5%), but fails most of the times.
 I want to know if the above logs indicate towards a particular error with
 our riak cluster?


Yes, so in 1-5% of the cases the nodes involved in a query are all the same
version.  The reasons this is non-deterministic is because Riak Search uses
some randomness during query time to help spread load around.



 Since this query has never failed on my local development box,
 I suspect either it has to do with something that changed between 1.2.1rc2
 and 1.2.1-stable release or something that is related to our production
 riak cluster.



As I said above.  I strongly suspect a mixed cluster scenario.  That's the
only time I've seen an error like the above.  The second email also
strongly indicates a mixed cluster scenario given the behavior you are
seeing.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search - searching across all fields

2012-12-14 Thread Ryan Zezeski
Matt, comments inline


On Tue, Dec 11, 2012 at 3:35 AM, Matt Painter m...@deity.co.nz wrote:


 Apart from a single default value, is it possible for Riak Search to
 search for a keyword across all fields in a document without having to
 specify the field up front as a prefix in one's search term?


A field must be specified to search, but a default field may be specified
in the schema [1].  This field will be searched if one is not specified.
 But there is no way to do a search against all fields.  It is always over
one field.


 I'm guessing that one solution could be a post-commit hook which
 recursively iterates over all fields and squashes them into a secondary
 default value field - but since I know even less about Erlang and am just
 starting out with Riak, I thought it prudent to see if there was a more
 straightforward solution...


Your use case immediately makes me think of Solr copy fields.  You index
everything under their individual fields but all values get copied into a
catch-all field so that all content may be searched easily.  However, with
this you lose the ability to know which field it came from.  Riak Search
doesn't have copy field functionality.  You'd have to concatenate all the
data into a field on your application side.  The new search solution I've
been working on, Yokozuna, uses Solr underneath and therefore does support
copy fields [2].

You could create a pre-commit hook to do this field-squashing but I think
you would be better off doing it in your application.  To do it via a hook
you'd have to make sure it runs before the search hook (I can't remember if
you can force specific order of pre-commit hooks).  It would also have an
effect on your write latencies as more pre-processing would have to be
done.  Finally, you would have to write Erlang.


 The use case is this:

 We are providing an object + metadata store for users to deposit files and
 any number of related fragments of structured JSON metadata. We are not
 enforcing any metadata schema - and therefore can't know up-front any field
 names - but would like the ability for a dumb keyword search from a website
 to return references to the records they have deposited in
 Riak. Essentially, providing a Google-like interface.

 (As a side question, Is Riak Search mature enough for these type of very
 generic searches? I know that it's inspired by Lucene and Lucene-like,
 but I don't know how many of Lucene's goodies are present - or is it just a
 case of invoking analysers provided by Lucene for things like stemming, and
 all will be pretty much equivalent for most situations?)


There are no goodies present _at all_.  Riak Search is an in-house
implementation, completely written in Erlang.  It's only connection to
Lucene/Solr is a superficial interface that looks very much like
Lucene/Solr.  E.g. you mention stemming, there is no stemming support in
Riak Search and would be a non-trivial addition.  This is one of the big
reasons Yokozuna is being written [2].  The world of search is vast and
complicated, best to start with proven solution and build from that.

Riak Search generally starts causing pain when you have searches that match
tens of thousands of documents.  The runtime is proportional to the size of
the result set.  In fact, Riak Search has a hard-coded upper limit to fail
queries that match 100K or more documents (although it does the work to get
100K results and then drops it all on the floor so you still use
resources/times).  For example, if a lot of your files were pictures and
were tagged with something like {type:picture} then a search for
picture is probably going to cause issues.  Things really start to hurt
when you do conjunction queries with multiple large result sets, e.g.
funny AND picture.  Once again, this is not the case with Yokozuna, which
in my benchmarking thus far has shown flat latencies regardless of result
set size.

-Z

[1]:
http://docs.basho.com/riak/latest/cookbooks/Riak-Search---Schema/#Defining-a-Schema

[2]: https://github.com/rzezeski/yokozuna
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak over LAN vs WAN

2012-11-19 Thread Ryan Zezeski
Jon,

On Mon, Nov 19, 2012 at 3:28 AM, Jon Perez jbperez...@yahoo.com wrote:


 Why exactly are these reasons?  I suspect they have to do with performance.
 If so, how exactly does performance degrade when nodes are spread out
 between hosts with latencies in the tens to hundred plus milliseconds (as
 is
 typical over WANs)?


Yes, latency is a big reason.  Every Riak requests involves N vnodes.  If
those vnodes are spread across different regions with varying latencies
then your deviation grows and higher percentiles go through the roof.  For
some this many be okay, but the wider you go the more unpredictable your
latency profile becomes.  Predictable latency is key to many applications
built on top of Riak.  Many of these apps are web-apps with tight
constraints of the maximum time any request should take.  Given that most
webapps are made up of many components behind the scenes it is vital the
each individual component deliver as predictably as possible so that the
developers can have some confidence on the end-to-end latency of a request.
 This is a point made very clear in the Dynamo paper which heavily
influenced the design of Riak.


 Or does it have more to do with reliability of connections and overhead in
 retrying them?  I can understand that Bashio has a vested interest in
 promoting Riak Enterprise for this, but it would be nice if the technical
 details of why were actually laid out in detail.


This is another reason.  A node that is dead is indistinguishable from a
node that is simply taking a really long time to respond.  Once you spread
nodes across a WAN the chance for network failure, and thus network
partitions, becomes much greater.  Riak is designed to always be available
for writes but you still want to avoid partitions as much as possible.
 Partitions are one of the primary causes of siblings, potentially
generating lots of sibling resolution.  Partitions also cause additional
load to be placed on the nodes.  Say you had a 6-node cluster configured as
2 3-node clusters in different data centers.  If the link between the data
centers goes down or becomes too slow you'd end up with a partition between
the 2 3-node clusters and each would have to take on the load that was
being served by the 6-node cluster.  This includes things like disk space,
file descriptors, open ports, memory usage, CPU usage, network utilization,
etc.



 From my limited experience with Riak, getting multiple nodes within a
 cluster going is extremely simple, whereas going multiple clusters is a
 very
 different story and requires a new layer of understanding.  It's too bad
 that there is a distinction between nodes over WANs and LANs.  I guess
 the holy grail of dbs is still some ways off, although Riak seems to be the
 closest fit right now.



The way Riak is designed is far from the most efficient way to replicate
data across a WAN.  A lot of that code was/is written with assumptions of
LAN and fairly predictable latency.  This is one of the reasons we have a
separate piece of software that performs this task.  This problem is not as
easy as some people think.  You should checkout Andrew Thompson's RICON
talk on Riak's WAN replication.

http://vimeo.com/52016325

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak over LAN vs WAN

2012-11-19 Thread Ryan Zezeski
On Mon, Nov 19, 2012 at 10:47 AM, Ryan Zezeski rzeze...@basho.com wrote:




 The way Riak is designed is far from the most efficient way to replicate
 data across a WAN.  A lot of that code was/is written with assumptions of
 LAN and fairly predictable latency.  This is one of the reasons we have a
 separate piece of software that performs this task.


I think I gave the wrong impression by saying our WAN support is a separate
piece of software.  It is a separate set of code but is tightly integrated
with our enterprise version of Riak.  Just wanted to clarify.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search Field Aliases

2012-11-05 Thread Ryan Zezeski
Brian,

First, our documentation is wrong.  Sorry.  The correct way to add aliases
looks like so:

 {field, [



  {name, Name},



  {analyzer_factory, {erlang, text_analyzers,
standard_analyzer_factory}},


  {alias, LastName},



  {alias,FirstName},



  {alias,MiddleName}



  ]},

Second, it looks like you want the semantics of a Solr copyField.  That is,
index the first, middle, and last names individually but also copy their
contents into the field `Name` so that a user can easily search against the
entire name.  Unfortunately Riak Search's alias mechanism doesn't provide
this semantic even if the documentation might give that impression.  An
alias does 2 things:

1. If a field exist with the same name as the alias then index it under the
containing field name.  E.g. if the field 'LastName' exists then index it
under 'Name'.  Riak Search only indexes a field once.  So either the alias
'FirstName' is found and indexed under 'Name' or 'FirstName' is found and
indexed under 'FirstName'.  This means that if you declare both an alias
and a normal field with the same name the order in the schema will
determine which one wins.  Both will not be used.

2. If there are multiple alises for a given field then concatenate the
values of every alias to form one field value.  The order they are
concatenated is the same order as they are declared in the object being
indexed.  Thus if you happened to index {M:Middle, F:Firrst, L:Last} then
the 'Name' field would have value 'Middle First Last'.

To achieve your goal you need to copy the field yourself.  Declare the
'Name' field like the other fields.  Don't use aliases.  Index the object
as {FirstName:First, MiddleName:Middle, LastName:Last,
Name:First Middle Last}.

-Z

P.S. The new search solution I've been working on, Yokozuna, integrates
Solr with Riak and thus supports copy fields.
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-November/010042.html

On Sun, Oct 28, 2012 at 4:54 PM, Brian Hodgen brian.hod...@gmail.comwrote:

 Can somebody explain in more detail how the aliases parameter on the
 search schema definition works? The documentation says it lets me
 index multiple fields into one, so I tried to setup some schemas to let me
 search on Name, that is actually the combined data of FirstName, LastName,
  MiddleName. I've got the search working for the properties by themselves,
 but I can't seem to make the aliases work, so I'm either doing something
 wrong or I misunderstood how they are supposed to work.


 Schema Example:

 I would really like this to work...but querying on Name never returns any
 results.

 {
 schema,
 [
 {version, 1.1},
 {n_val, 3},
 {default_field, Name},
 {analyzer_factory, {erlang, text_analyzers, noop_analyzer_factory}}
 ],
 [
 {field, [
 {name, FirstName},
 {analyzer_factory, {erlang, text_analyzers, standard_analyzer_factory}}
  ]},
 {field, [
 {name, MiddleName},
 {analyzer_factory, {erlang, text_analyzers,
 standard_analyzer_factory}}
 ]},
  {field, [
 {name, LastName},
 {analyzer_factory, {erlang, text_analyzers, standard_analyzer_factory}}
  ]},
 {field, [
 {name, Name},
 {analyzer_factory, {erlang, text_analyzers,
 standard_analyzer_factory}},
 {aliases, [LastName,FirstName,MiddleName]}
  ]},
 {dynamic_field, [
 {name, *},
 {skip, true}
  ]}
 ]
 }.


 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Search not working any more

2012-11-03 Thread Ryan Zezeski
Martin,

On Wed, Oct 31, 2012 at 6:01 PM, Martin Streicher 
martin.streic...@gmail.com wrote:


 I deleted my database today using rm -rf on the data directory. I stopped
 Riak before the delete, and restarted after recreating that directory with
 mkdir.

 Now, I cannot search.


 The data directory includes the ring file.  The ring file is where custom
bucket properties are stored like the search pre-commit hook.  By deleting
the ring those hooks are lost and need to be re-installed.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Search not working any more

2012-11-03 Thread Ryan Zezeski
On Sat, Nov 3, 2012 at 4:33 PM, Martin Streicher martin.streic...@gmail.com
 wrote:


 Is there a programmatic way to achieve the equivalent of search-cmd
 install zids, so that whenever my application launches, it can enable those
 settings?


Yes.  At startup you could add the 'search' bucket property with a value of
'true'.

curl -XPUT -H 'content-type: application/json' '
http://localhost:8098/riak/foo' -d '{props:{search:true}}'

That will cause the pre-commit hook to be added.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Deleting items from search index increases disk usage

2012-11-02 Thread Ryan Zezeski
On Fri, Nov 2, 2012 at 9:52 AM, Jeremy Raymond jeraym...@gmail.com wrote:

 Some files changed and some didn't.
 Not really sure how to interpret the differences.


Another thing, compacting will occur only if there are 6 or more active
segments.  So once you get down to 5 segments or less compaction becomes a
noop.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Deleting items from search index increases disk usage

2012-11-02 Thread Ryan Zezeski
Active is any segment file that has the suffix .data.

[Sent from my iPhone]

On Nov 2, 2012, at 11:11 AM, Jeremy Raymond jeraym...@gmail.com wrote:

 When do segments become active/inactive?
 
 --
 Jeremy
 
 
 On Fri, Nov 2, 2012 at 10:50 AM, Ryan Zezeski rzeze...@basho.com wrote:
 
 
 On Fri, Nov 2, 2012 at 9:52 AM, Jeremy Raymond jeraym...@gmail.com wrote:
 Some files changed and some didn't.
 Not really sure how to interpret the differences.
 
 Another thing, compacting will occur only if there are 6 or more active 
 segments.  So once you get down to 5 segments or less compaction becomes a 
 noop.
 
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Deleting items from search index increases disk usage

2012-11-02 Thread Ryan Zezeski
Jeremy,

On Fri, Nov 2, 2012 at 12:31 PM, Jeremy Raymond jeraym...@gmail.com wrote:

 I cycled through the compaction on another node. Again after 3 rounds
 compaction has stopped. On one node the merge index is 26 GB on the other
 21 GB. So it looks like I've hit the 5 segment compaction no-op condition
 on both nodes.


I concur.  This condition seems arbitrary to me and I'm not sure if there
is a good reason for it to exist.  But it's there and the only way we could
remove it for you is to hot-load a new beam.


 What would account for the difference in merge_index size? Shouldn't these
 be relatively the same? There must still be tombstones in there...


Riak Search uses term-based partitioning.  It could be that you have some
terms that are more frequent than others which would account for some of
the difference.



 On my production cluster the merge_index is ~44GB. I estimate that
 approximately 90 - 95% of the index data belongs to the bucket I no longer
 want indexed. Manually deleting items from the index then manually
 triggering compaction doesn't look like it will scale. Will this workflow
 work to re-build the search index. I need to keep the cluster available for
 writes while doing this:

 1. In a rolling fashion, disable Riak Search one node at a time.
 2. Delete the contents of the merge_index on each node.
 3. In a rolling fashion, re-enable Riak Search on each node.
 4. Reindex the items to be included in the search index.


No, instead of disabling Riak Search you'll want to take the nodes down one
at a time, remove the merge index data, restart.  After doing this for all
nodes then re-index your data.



 This should do the trick right? Do I need to disable search before
 clearing out the merge_index folders or would disabling the search index on
 the buckets via search-cmd be enough (and then re-enabling) before
 re-indexing?


Again, don't bother disabling search.  The key is to take the nodes down
because merge index caches stuff in memory.

Actually, I thought of another way to achieve the same result without
taking the nodes down.  If you have a non-production cluster to test this
on that would be a good precaution.  I'm 99% sure this should work without
issue.

1. Make sure no indexes are incoming, do this either at your client or
uninstall all search hooks

For each node:

2. Get a list of the MI Pids like in the manual compaction example
3. For each MI Pid call merge_index:drop(MIPid)
3a. Verify the data files were removed on disk

After performing steps 2  3 on each node:

4. Re-write the objects you want indexed (of course remember to re-install
the hooks if you removed them in step 1)

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


ANN: Yokozuna 0.1.0

2012-11-01 Thread Ryan Zezeski
Riak Users,

I'm happy to announce the first alpha release of Yokozuna.  Yokozuna is a
new take on providing search support for Riak.  It tightly integrates Solr
4.0.0 and the Riak master branch.

I'm very excited about Yokozuna.  It brings the power of Solr search to
Riak.  This means language support, analyzer support, and advanced querying
support including boolean, ranked, facet, and spatial.  You can even query
Yokozuna with existing Solr clients*!  Basically, if Solr supports it
Yokozuna probably does too.  On the other side of things Riak uses its
great distributed bits to replicate and scale out Solr.  Riak provides
support for anti-entropy, handoff, distributed queries, and data
replication.  Together these two technologies complement each other well.

Learn more about the 0.1.0 release:
https://github.com/rzezeski/yokozuna/blob/master/docs/RELEASE_NOTES.md

Getting started with Yokozuna:
https://github.com/rzezeski/yokozuna#getting-started

If you would rather not build from source an EC2 AMI is provided that may
be used: https://github.com/rzezeski/yokozuna/blob/master/docs/EC2.md

If you would like to see some very high-level diagrams of Yokozuna's
architecture checkout my slides from RICON:
https://speakerdeck.com/basho/yokozuna-ricon

I'm looking for people to work with me directly to prototype solutions
using Yokozuna.  If that sounds interesting please email me directly.  If
you have any questions don't hesitate to ask them on riak-users or email me
directly.

-Z

* - Please note I've only tested this against SolrJ.  See
https://github.com/rzezeski/yokozuna/blob/7abbc3f7430373a58fdefaa65731759344e86cc7/priv/java/com/basho/yokozuna/query/SimpleQueryExample.java
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Deleting items from search index increases disk usage

2012-11-01 Thread Ryan Zezeski
Jeremy,

I was looking at the merge index code and I think the issue is that the
method by which segments are chosen for compaction may be very slow to get
to the larger segments.

1. Merge Index only schedules merging when a buffer is rolled-over to a
segment.  This means there will _always_ be at least one small segment in
the list of potential segments to merge.

2. To determine which segments to merge the mean of all segment sizes is
taken.

Over time the mean will skew left of the bulk of the distribution.  This
means most compactions will touch only recent, smaller segments and it will
take many iterations before one of the larger ones is included.  To help
verify this you could list all you segment sizes again and compare them
with the last run.  My guess is you'll have about the same number of
segments but the smallest one will have grown a bit.  It depends how much
unique data you re-indexed.

Depending on the distribution of your segment sizes I think it might be
possible to reclaim some of this space via repeated compaction calls.  It
turns out there is a way to manually invoke compaction.  It's just not easy
to get too.  Try running the following gist on one of your nodes
https://gist.github.com/3996286.  Try running merge_index:compact over and
over again and each time check for changes in the segment file sizes.


-Z

On Thu, Nov 1, 2012 at 11:25 AM, Jeremy Raymond jeraym...@gmail.com wrote:

 I reindexed a bunch of items that are still in the search index but no
 disk space was reclaimed. Is there any Riak console Erlang voodoo I
 can do to convince Riak Search that now would be a good time to
 compact the merge_index?

 --
 Jeremy


 On Tue, Oct 30, 2012 at 4:26 PM, Jeremy Raymond jeraym...@gmail.com
 wrote:
  I've posted the list of buffer files [1] and segment files [2].
 
  The current data set I have in Riak is static, so no new items are
  being written. So this looks like the reason as to why compaction
  isn't happening since there is no time based trigger on the merge
  index. To get compaction to kick in, I should be able to to just
  reindex (by reading and rewriting) some of the existing items in
  buckets that are still indexed? Earlier today I upgraded to Riak 1.2
  and ran a Search read repair [3] in an attempt to kick of compaction.
  Compaction didn't kick in, but instead disk consumption increased
  again. Should Search Repair trigger compaction or only writing objects
  to the KV store?
 
  [1]:https://gist.github.com/3982718
  [2]:https://gist.github.com/3982730
  [3]:
 http://docs.basho.com/riak/latest/cookbooks/Repairing-Search-Indexes/#Running-a
  Repair
 
  --
  Jeremy
 
 
  On Tue, Oct 30, 2012 at 3:47 PM, Ryan Zezeski rzeze...@basho.com
 wrote:
  find /var/lib/riak/merge_index -name 'buffer.*' | xargs ls -lah
 
  find /var/lib/riak/merge_index -name 'segment.*' | xargs ls -lah

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Deleting items from search index increases disk usage

2012-10-30 Thread Ryan Zezeski
Jeremy,

This is how Merge Index (the index store behind Riak Search) works.  It is
log-based meaning deletes are first logical before they become physical.
 It does not update in-place as you stated in one of your replies.  When
you performed those deletes new logs were created containing logical
deletes (tombstones) causing more disk to be used.  Assuming other buckets
are still being indexed then compaction should be occurring and tombstones
should be reaped.  Meaning both the logical delete and the datum should be
removed from disk.  If no new indexes are arriving then nothing will be
compacted as there is no time-based trigger on Merge Index.

Merge Index could be doing a bad job of picking which segments to merge,
leaving a high % of tombstones on disk longer than necessary.  I'm curious,
what is the output from the following commands.

find /var/lib/riak/merge_index -name 'buffer.*' | xargs ls -lah

find /var/lib/riak/merge_index -name 'segment.*' | xargs ls -lah


On Mon, Oct 29, 2012 at 8:19 AM, Jeremy Raymond jeraym...@gmail.com wrote:

 So the only way to actually free the disk space consumed by the
 tombstones in the search index is to bring down the cluster and blow
 away the merge index (at /var/lib/riak/merge_index)?



If, and only if, you are no longer indexing _any_ buckets then this would
be the thing to do.  If you are still indexing some buckets then deleting
these files would break their indexes.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: How to search objects using riak_search method

2012-10-28 Thread Ryan Zezeski
On Tue, Oct 9, 2012 at 7:49 AM, 郎咸武 langxian...@gmail.com wrote:


 ((ejabberd@meta)19 f(O),O=riakc_obj:new(user1, jason3,
 list_to_binary([{\name\:\\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2\},{\sex\:\male1\}]),
 application/json).
 {riakc_obj,user1,jason3,undefined,[],
{dict,1,16,16,8,80,48,
  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},

  {{[],[],[],[],[],[],[],[],[],[],[[...|...]],[],[],...}}},

  
 [{\name\:\\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2\},{\sex\:\male1\}]}

 (ejabberd@meta)20 riakc_pb_socket:put(Pid, O).
 ok

 (ejabberd@meta)28  riakc_pb_socket:search(Pid, user1,
 list_to_binary(\sex\:male1*)).  * The operation is ok.*
 {ok,{search_results,[{user1,
   [{id,jason3},
{name,

 195,169,194,131,194,142,195,165,194,147,194,178},
{sex,male1}]}],
 0.0,1}}



Notice the value of the name field in the result here.  It has been
properly converted to a UTF-8 sequence.  That is, at some point Riak Search
took your ASCII string of unicode escapes and converted it to a proper
unicode byte sequence.



 (ejabberd@meta)29  riakc_pb_socket:search(Pid, user1,
 list_to_binary(\name\:\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2)).

 {ok,{search_results,[],0.0,0}}  *%% But there is empty. Why?*



First off, you are adding additional quotes around the name field.

11 list_to_binary(\name\:\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2).
\name\:\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2

Second, you are searching for the ASCII string
\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2.  At no point is this string
converted to a unicode sequence for you.  This is the correct behavior
because you might have ASCII documents containing unicode escapes.  You
need to query using a proper unicode binary.

19 riakc_pb_socket:search(Pid, user1,
name:,195,169,194,131,194,142,195,165,194,147,194,178).
{ok,{search_results,[{user1,
  [{id,jason3},
   {name,

195,169,194,131,194,142,195,165,194,147,194,178},
   {sex,male1}]}],
0.35355299711227417,1}}


(ejabberd@meta)30  riakc_pb_socket:search(Pid, user1,
 list_to_binary(\name\:\\u00e9\\u0083\\u008e\\u00e5\\u0093\\u00b2*)).

 {ok,{search_results,[],0.0,0}}  *%% There is empty,too. Why?*


20 riakc_pb_socket:search(Pid, user1, name:,195,169,*).

{ok,{search_results,[{user1,
  [{id,jason3},
   {name,

195,169,194,131,194,142,195,165,194,147,194,178},
   {sex,male1}]}],
0.0,1}}

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: This is about riak search question. How to search utf8 format dat?

2012-10-28 Thread Ryan Zezeski
On Wed, Oct 10, 2012 at 12:52 AM, 郎咸武 langxian...@gmail.com wrote:


 *2)To put a Object to user1 bucket. The data is utf8 format.*

 (trends@jason-lxw)123 f(O), O=riakc_obj:new(user1,
 jason5,list_to_binary(mochijson:encode({struct, [{name,
 binary_to_list(unicode:characters_to_binary(爱))},{sex,male}]})),
 application/json).
 {riakc_obj,user1,jason5,undefined,[],
{dict,1,16,16,8,80,48,
  {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},

  {{[],[],[],[],[],[],[],[],[],[],[[...|...]],[],[],...}}},
{\name\:\\\u00e7\\u0088\\u00b1\,\sex\:\male\}}
 (((trends@jason-lxw)124 riakc_pb_socket:put(Pid, O).

 ok


First, let's start with your data and make sure it's getting stored
properly.

3 UC = unicode:characters_to_binary(爱).
231,136,177

Okay, so Erlang properly decoded this into a 3-byte unicode sequence.  What
does mochijson2 think? (I noticed you are using mochison, I recommend using
mochijson2).

4 mochijson2:encode({struct, [{name, UC}]}).
[123,[34,name,34],58,[34,\\u7231,34],125]

Good, mochijson2 properly interpreted this as u7231.  A quick lookup on the
web verifies this is correct:
http://www.fileformat.info/info/unicode/char/7231/index.htm.

But notice in your code you call binary_to_list on the binary before
passing it to mochi.  Lets see what happened.

15 binary_to_list(UC).
[231,136,177]

Okay, so the integers are correct.  But Erlang treats lists differently
from binaries.  It's just a list of integers to Erlang.

16 io:format(~ts~n,[binary_to_list(UC)]).
爱
ok

This is why mochi converted it to 3 chatacters: \\u00e7\\u0088\\u00b1

To make a proper unicode list the unicode:caracters_to_list function must
be used.

17 UCS = unicode:characters_to_list(爱).
[29233]

18 io:format(~ts~n, [UCS]).
爱
ok

Let's try encoding again, but this time leave out the list_to_binary.

19 riakc_obj:new(user1, jason5, mochijson2:encode({struct,
[{name, unicode:characters_to_binary(爱)}]}), application/json).
{riakc_obj,user1,jason5,undefined,[],
   {dict,1,16,16,8,80,48,
 {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...},

 {{[],[],[],[],[],[],[],[],[],[],[[...|...]],[],[],...}}},
   [123,[34,name,34],58,[34,\\u7231,34],125]}

And there we go.  A properly encoded unicode character.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: riak search - creating many indexes for one inserted object

2012-10-19 Thread Ryan Zezeski
Pawel,

On Tue, Oct 9, 2012 at 5:21 PM, kamiseq kami...@gmail.com wrote:

 hi all,

 right now we are using solr as search index and we are inserting data
 manually. so there is nothing to stop us from creating many indexes
 (sort of views) on same entity, aggregate data and so on.
 can something like that be achieved with riak search??


Just to be sure I understand you.  When you say many indexes do you mean
something like writing to multiple Solr cores?  If so, no, Riak Search
cannot do that.  It writes to an index named after the bucket you have the
hook on.


 I think that commit hooks are good point to start with but as I read
 search index is kept in different format than bucket data and I would
 love to still use solr-like api to search the index.


Yes, Riak Search stores index data in a backend called Merged Index.  Riak
Search has a Solr _like_ interface but it lacks many features, and doesn't
have the same semantics or performance characteristics.

There is a new project underway called Yokozuna which tightly integrates
Riak and Solr.  If you like Solr then keep an eye on this.  I'm looking for
people who want to prototype on it so if that interests you please email me
directly.

https://github.com/rzezeski/yokozuna


 example

 I have two entities cars and parking_lots, each car references parking
 lot it belongs to.
 when I create/update/delete car object I would like to not only update
 car index (so I can search by car type, name, number plates, etc) but
 also update parking index to easily check how many cars given lot has
 (plus search lots by cars, or search cars with given property).


Why have a separate index at all?  Is it not good enough to have just the
car index.  Each doc would have a 'parking_lot_s' field.

How many cars a given lot has -- would be numFound on q=parking_lot_s:foo.

Search lots by cars -- I'm guessing you mean something like tell me what
lots have cars like this, sounds like a facet on 'parking_lot_s', right?

Search cars with a given property -- like the last query but no facet.


 probably all this can be achieved in many other ways. I can imagine
 storing array of direct references in parking object and update this
 object when car object also changed. but this way I need to issue two
 asynchronous write request with no guaranties that both will be
 persisted.


Yes.  This is a problem with two Solr cores as well.  I'm not sure if this
is a toy example but I don't see the need for 2 indexes.  I potentially see
2 buckets: 'cars' and 'lots'.  But that doesn't mean it has to be two
indexes.  Does that make sense?

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak Search

2012-10-16 Thread Ryan Zezeski
On Sun, Oct 14, 2012 at 12:33 AM, Pavel Kogan pavel.ko...@cortica.comwrote:


 1) Is search enabling has any impact on read latency/throughput?


If you are reading and searching at the same time there is a good chance it
will.  It will cause more disk seeks.


 2) Is search enabling has any impact on RAM usage?


Yes, the index engine behind Riak Search makes heavy usage of Erlang ETS
tables.  Each partition has an in-memory buffer as well as an in-memory
offset table for every segment.  It also uses a temporary ETS table for
every write to store posting data.  The ETS system limit can even become an
issue in overload scenarios.


 3) In production we have no search enabled. What is the best way to
 enable search without stop production? I thought about something like:
 1) Enable search node after node.


You could change the app env dynamically but that's only half the problem.
 The other half is then starting the Riak Search application.  I think
application:start(merge_index) followed by application:start(riak_search)
should work but I'm not 100% sure and this has not been tested.  You'll
also want to make sure to edit all app.configs so that it is persistent.



2) Execute some night script that runs on all keys and overwrite them
 back
 with proper mime type.


Yes, you'll want to install the commit hook on the buckets you wish to
index.  Then you'll want to do a streaming list-keys or bucket map-reduce
and re-write the data.



4) If we see that search overhead is something we can't handle, is there
 simple
 way to disable it without stop production?


I think the best course of action in this case would be to disable the
commit hook.  But you would have to keep track of anything written during
this time and re-write it after re-installing the hook.  If you don't then
you'll have to re-index everything because you don't know what you missed.

5) In what case we would need repair? It is said - on replica loss, but if
 I understand
 correct we have 3 replicas on different nodes don't we? If it happens
 how difficult and
 long would it be for large cluster (about 100 nodes)?


Repair is on a per partition basis.  Number of nodes doesn't come into
play.  Repair is very specific in that it requires the adjacent partitions
to be in a good, convergent state.  If they aren't then repair isn't much
help.

A lot of these entropy issues go away in Yokozuna.  Repairing indexes is
done automatically, in the background, in an efficient manner.  There is no
need to re-write data or run manual repair commands.

-Z
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


  1   2   3   >