Re: Did a force-remove of two nodes, now system is unresponsive

2014-08-19 Thread Ciprian Manea
Hi Marcel,

What is the configured ring size for this cluster?

You can slow down the transfers by running $ riak-admin transfer-limit 1 in
one of your riak nodes. iowait should decrease as well once transfer-limit
is lowered, unless one of your disks is failing or is about to fail.


Regards,
Ciprian


On Mon, Aug 18, 2014 at 9:18 PM, marcel.koopman marcel.koop...@gmail.com
wrote:

 We have a 5 node riak cluster.
 Two nodes in this cluster, had to be removed because they are no longer
 available (since a half year).
 So a force remove was done.

 After this, the 3 remaining nodes began to transfer all data. So we ended
 up
 with a complete unresponsive system. The iowait is blocking us now.
 So we are hoping that this will settle today, the next transaction was
 actually adding two new nodes.

 And yes this is production, Is there any chance we lost data?



 --
 View this message in context:
 http://riak-users.197444.n3.nabble.com/Did-a-force-remove-of-two-nodes-now-system-is-unresponsive-tp4031603.html
 Sent from the Riak Users mailing list archive at Nabble.com.

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Bitcask Key Listing

2014-08-19 Thread Jason Campbell
I currently maintain my own indexes for some things, and use natural keys where 
I can, but a question has been nagging me lately.

Why is key listing slow?  Specifically, why is bitcask key listing slow?

One of the biggest issues with bitcask is all keys (including the bucket name 
and some overhead) must fit into RAM.  For large amounts of keys, I understand 
the coordination data transfer will hurt, but shouldn't things like list 
buckets (or listing keys from small buckets) be fast?

Is there a reason this is slow, and is there a plan to fix it?

Thanks,
Jason

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Fwd: RiakCS 504 Timeout on s3cmd for certain keys

2014-08-19 Thread Alex Millar
Hey Kota,

We’re currently using the following versions;

# Download RiakCS 
# Version: 1.4.5
# OS: Ubuntu 12.04 (Precise) AMD 64
curl -O 
http://s3.amazonaws.com/downloads.basho.com/riak-cs/1.4/1.4.5/ubuntu/precise/riak-cs_1.4.5-1_amd64.deb

# Download Riak
# Version: 1.4.8
# OS: Ubuntu 12.04 (Precise) AMD 64
curl -O 
http://s3.amazonaws.com/downloads.basho.com/riak/1.4/1.4.8/ubuntu/precise/riak_1.4.8-1_amd64.deb

I checked our RiakCS app.config and fold_objects_for_list_keys is set to false. 
What impact would it have my cluster if I flip that to true? Would I simply 
update the app.config and restart RiakCS?

As for the consideration on garbage collection, the slow performance is 
happening consistently over the span of a week (since we noticed it as we don’t 
often list buckets). I suspect its not the case regarding large amounts of 
objects being deleted as generally all data going into that bucket is 
write-once (we process PDFs pages to .JPG and PUT them in that bucket, the only 
time overwrites occur is if we manually re-trigger the processing script to run 
on a specific document) 

Adding ke...@basho.com as we have another thread going on about this same 
topic, I figured we could merge the discussion to reduce duplicate effort here. 

 Alex Millar, CTO  
Office: 1-800-354-8010 ext. 704  
Mobile: 519-729-2539  
GoBonfire.com

From: Kota Uenishi k...@basho.com
Reply: Kota Uenishi k...@basho.com
Date: August 18, 2014 at 10:03:40 PM
To: Alex Millar a...@gobonfire.com
Cc: Charlie Voiselle cvoise...@basho.com, Tad Bickford 
tbickf...@basho.com, Riak-Users riak-users@lists.basho.com, Brandon Noad 
bran...@gobonfire.com
Subject:  Re: Fwd: RiakCS 504 Timeout on s3cmd for certain keys  

Alex,

Riak CS 1.4.5 and 1.5.0 had a lot of improvement after those articles you put 
the URL, not it is not using Riak's bucket listing but using Riak's internal 
API for more efficient listing. What version of Riak CS are you using? I want 
you to make sure you're using those versions and a line 
`{fold_objects_for_list_keys, true},` at riak_cs section of app.config 
(assuming all other Riak part correctly configured). 

 Based on this I’m thinking that cost of this type of query is only going to 
get worse over time as we add more keys to this bucket (unless secondary 
indexes can be added). Or am I totally out to lunch here and there’s some 
other underlying problem?

The strange part is s3cmd. Riak CS has incremental bucket listing API that 
requires clients to iterate on every 1000 objects (common prefixes), but s3cmd 
iterates all the specified bucket before printing them all. You can observe how 
s3cmd and Riak CS interacts if you specify '-d' option like this:

```
s3cmd -d -c yours.s3cfg ls -r s3://yourbucket/yourdir/
```

I would expect Riak CS's listing API is not much slow  as to need 5 seconds 
(or, say, 10 seconds) because, on each request it just returns 1000 objects. 

There might be another possibility on slow query - if you had many (say, more 
than 10 thousands) deleted objects on the same bucket it might affect each 1000 
listing. This will eventually be solved as Riak CS's garbage collection removes 
deleted manifests, which is just marked as deleted (and to be ignored 
correctly).

[1] 
http://www.quora.com/Riak/Is-it-really-expensive-for-Riak-to-list-all-buckets-Why

On Thu, Aug 14, 2014 at 6:05 AM, Alex Millar a...@gobonfire.com wrote:
Good afternoon Charlie,

So the issue we’re having is only with bucket listing.

alxndrmlr@alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls 
s3://bonfirehub-resources-can-east-doc-conversion
                       DIR   
s3://bonfirehub-resources-can-east-doc-conversion/organizations/

real 2m0.747s
user 0m0.076s
sys 0m0.030s

where as…

alxndrmlr@alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls 
s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals
                       DIR   
s3://bonfirehub-resources-can-east-doc-conversion/organizations/OrganizationID-1/documents/proposals/

real 0m10.262s
user 0m0.075s
sys 0m0.028s

The contents of this bucket contains a lot of very small files (basically for 
each PDF we receive I split it to .JPG foreach page and store them here. Based 
on the my latest counts it looks like we have around 170,000 .JPG files in that 
bucket.

Here’s a snippet from the HAProxy log for the 504 timeouts…

Aug 12 16:01:34 localhost.localdomain haproxy[4718]: 192.0.223.236:48457 
[12/Aug/2014:16:01:24.454] riak_cs~ riak_cs_backend/riak3 161/0/0/-1/10162 504 
194 - - sH-- 0/0/0/0/0 0/0 
{bonfirehub-resources-can-east-doc-conversion.bf-riakcs.com} GET /?delimiter=/ 
HTTP/1.1

I’ve put together a video showing off the top results of each of the 5 riak 
nodes while performing $ time s3cmd -c .s3cfg-riakcs-admin ls 
s3://bonfirehub-resources-can-east-doc-conversion

https://dl.dropboxusercontent.com/u/5723659/RiakCS%20ls%20monitoring%20results.mov

Now I’ve had a hunch this is just 

Re: Bitcask Key Listing

2014-08-19 Thread Kelly McLaughlin

Jason,

There are two aspects to to a key listing operation that make it 
expensive relative to normal gets or puts.


The first part is that, due to the way data is distributed in Riak, key 
listing requires a covering set of vnodes to participate in
order to determine the list of keys for a bucket. A minimal covering set 
of vnodes works out to 1/N nodes in the cluster where N
is the n_val of the bucket. By default this is 3 so in the default case 
a key listing request must send a request to and receive
responses from 1/3 of the nodes in the cluster. This incurs network 
traversal overhead as the keys from each vnode are returned
and the speed to completion is limited by the slowest vnode in the 
covering set. This is true regardless of the backend in use.


The second part is specific to bitcask. Bitcask is an unordered backend 
and the consequence of this when doing a key listing is
that all of the keys stored by a vnode that participates in a key 
listing request must be scanned. It doesn't matter if there are
2 keys or 2000 keys for the bucket being queried, they all must be 
scanned. This is a case where all the keys being stored in memory
is beneficial to performance, but as the amount of data stored increases 
so does the expense to scan over it. The leveldb backend is
ordered and we are able to take advantage of that fact to only scan over 
data for the bucket in question, but for bitcask that is

not an option.

At this time there is nothing in the works to specifically improve key 
listing performance. It is certainly something we are aware of,

but at this time there are other things with higher priority.

Hope that helps answer your question.


Kelly


On 08/19/2014 05:17 AM, Jaston Campbell wrote:

I currently maintain my own indexes for some things, and use natural keys where 
I can, but a question has been nagging me lately.

Why is key listing slow?  Specifically, why is bitcask key listing slow?

One of the biggest issues with bitcask is all keys (including the bucket name 
and some overhead) must fit into RAM.  For large amounts of keys, I understand 
the coordination data transfer will hurt, but shouldn't things like list 
buckets (or listing keys from small buckets) be fast?

Is there a reason this is slow, and is there a plan to fix it?

Thanks,
Jason

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Fwd: RiakCS 504 Timeout on s3cmd for certain keys

2014-08-19 Thread Kelly McLaughlin

Alex,

The value you had set for the fold_objects_for_list_keys setting is one 
I was very interested to see and I highly recommend setting it to true 
for your cluster.
The impact of setting this to true should be to make bucket listing 
operations generally more efficient. There should be no detrimental 
effects. There are
also some optimizations for bucket listing queries that use the prefix 
request parameter so I would expect queries to list specific 
subdirectories in a bucket to
show improved perforamnce as well. Changing the app.config and 
restarting the CS node is the correct way to have it take effect.



As for GC performance, I would recommend to add an entry to your 
app.config file to set gc_paginated_indexes to true. This option 
causes the GC process
to use a more efficient process for determining data that is eligible 
for collection and generally results in far fewer timeouts and better 
success for

users.


Kelly

On 08/19/2014 07:32 AM, Alex Millar wrote:

Hey Kota,

We’re currently using the following versions;

# Download RiakCS
# Version: 1.4.5
# OS: Ubuntu 12.04 (Precise) AMD 64
curl -O 
http://s3.amazonaws.com/downloads.basho.com/riak-cs/1.4/1.4.5/ubuntu/precise/riak-cs_1.4.5-1_amd64.deb


# Download Riak
# Version: 1.4.8
# OS: Ubuntu 12.04 (Precise) AMD 64
curl -O 
http://s3.amazonaws.com/downloads.basho.com/riak/1.4/1.4.8/ubuntu/precise/riak_1.4.8-1_amd64.deb


I checked our RiakCS app.config and fold_objects_for_list_keys is set 
to false. What impact would it have my cluster if I flip that to true? 
Would I simply update the app.config and restart RiakCS?


As for the consideration on garbage collection, the slow performance 
is happening consistently over the span of a week (since we noticed it 
as we don’t often list buckets). I suspect its not the case regarding 
large amounts of objects being deleted as generally all data going 
into that bucket is write-once (we process PDFs pages to .JPG and PUT 
them in that bucket, the only time overwrites occur is if we manually 
re-trigger the processing script to run on a specific document)


*Adding ke...@basho.com* as we have another thread going on about this 
same topic, I figured we could merge the discussion to reduce 
duplicate effort here.


Bonfire Logo*Alex Millar*, CTO
Office: 1-800-354-8010 ext. 704 tel:+18003548010
Mobile: 519-729-2539 tel:+15197292539
*GoBonfire*.com http://GoBonfire.com


From: Kota Uenishi k...@basho.com mailto:k...@basho.com
Reply: Kota Uenishi k...@basho.com mailto:k...@basho.com
Date: August 18, 2014 at 10:03:40 PM
To: Alex Millar a...@gobonfire.com mailto:a...@gobonfire.com
Cc: Charlie Voiselle cvoise...@basho.com 
mailto:cvoise...@basho.com, Tad Bickford tbickf...@basho.com 
mailto:tbickf...@basho.com, Riak-Users riak-users@lists.basho.com 
mailto:riak-users@lists.basho.com, Brandon Noad 
bran...@gobonfire.com mailto:bran...@gobonfire.com

Subject: Re: Fwd: RiakCS 504 Timeout on s3cmd for certain keys


Alex,

Riak CS 1.4.5 and 1.5.0 had a lot of improvement after those articles 
you put the URL, not it is not using Riak's bucket listing but using 
Riak's internal API for more efficient listing. What version of Riak 
CS are you using? I want you to make sure you're using those versions 
and a line `{fold_objects_for_list_keys, true},` at riak_cs section 
of app.config (assuming all other Riak part correctly configured).


Based on this I’m thinking that cost of this type of query is only 
going to get worse over time as we add more keys to this bucket 
(unless secondary indexes can be added). Or am I totally out to lunch 
here and there’s some other underlying problem?


The strange part is s3cmd. Riak CS has incremental bucket listing API 
that requires clients to iterate on every 1000 objects (common 
prefixes), but s3cmd iterates all the specified bucket before 
printing them all. You can observe how s3cmd and Riak CS interacts if 
you specify '-d' option like this:


```
s3cmd -d -c yours.s3cfg ls -r s3://yourbucket/yourdir/
```

I would expect Riak CS's listing API is not much slow  as to need 5 
seconds (or, say, 10 seconds) because, on each request it just 
returns 1000 objects.


There might be another possibility on slow query - if you had many 
(say, more than 10 thousands) deleted objects on the same bucket it 
might affect each 1000 listing. This will eventually be solved as 
Riak CS's garbage collection removes deleted manifests, which is just 
marked as deleted (and to be ignored correctly).


[1] 
http://www.quora.com/Riak/Is-it-really-expensive-for-Riak-to-list-all-buckets-Why


On Thu, Aug 14, 2014 at 6:05 AM, Alex Millar a...@gobonfire.com 
mailto:a...@gobonfire.com wrote:


Good afternoon Charlie,

So the issue we’re having is only with bucket listing.

alxndrmlr@alxndrmlr-mbp $ time s3cmd -c .s3cfg-riakcs-admin ls
s3://bonfirehub-resources-can-east-doc-conversion
   DIR
s3://bonfirehub-resources-can-east-doc-conversion/organizations/

   

Re: Riak Search Issue

2014-08-19 Thread Alex De la rosa
Hi Eric,

You were right on naming the bucket the same as the index... it worked that
way:

bucket = client.bucket_type('futbolistas').bucket('famoso')
results = bucket.search('name_s:Lion*')
print results

{'num_found': 2, 'max_score': 1.0, 'docs': [{u'age_i': u'30', u'name_s':
u'Lionel', u'_yz_rk': u'lionel', u'_yz_rb': u'fcb', u'score':
u'1.e+00', u'leader_b': u'true', u'_yz_id':
u'1*futbolistas*fcb*lionel*59', u'_yz_rt': u'futbolistas'}, {u'age_i':
u'30', u'name_s': u'Lionel', u'_yz_rk': u'lionel', u'_yz_rb': u'famoso',
u'score': u'1.e+00', u'leader_b': u'true', u'_yz_id':
u'1*futbolistas*famoso*lionel*8', u'_yz_rt': u'futbolistas'}]}

Later will check to install GIT's version and see if it works with a
different bucket name.

Thanks.
Alex


On Mon, Aug 18, 2014 at 11:12 PM, Alex De la rosa alex.rosa@gmail.com
wrote:

 Hi Eric,

 I will try this suggestion, also I will try Luke's suggestion on using
 GIT's latest version instead of PIP to see if is something already fixed.

 Once done that, I will tell you guys if is really a bug or if it was fixed
 already on GIT cloning.

 Thanks,
 Alex


 On Mon, Aug 18, 2014 at 11:10 PM, Eric Redmond eredm...@basho.com wrote:

 Alex,

 You may have discovered a legitimate bug in the python driver. In the
 meantime, if you give your bucket and index the same name, you can proceed,
 while we investigate.

 Thanks,
 Eric


 On Aug 18, 2014, at 2:00 PM, Alex De la rosa alex.rosa@gmail.com
 wrote:

 Yes, I did it in purpose, because I did so many testings that I wanted to
 start fresh... so I kinda translated the documentation, but that is
 irrelevant to the case.

 Thanks,
 Alex


 On Mon, Aug 18, 2014 at 10:59 PM, Eric Redmond eredm...@basho.com
 wrote:

 Your steps seemed to have named the index famoso.

 Eric


 On Aug 18, 2014, at 1:56 PM, Alex De la rosa alex.rosa@gmail.com
 wrote:

 Ok, I found the first error in the documentation, parameters are in
 reverse order:

 bucket = client.bucket('animals', 'cats')

 should be:

 bucket = client.bucket('cats', 'animals')

 Now I could save and it found the bucket type: bucket =
 client.bucket('fcb','futbolistas') VS bucket = client.bucket('futbolistas',
 'fcb')

 However, even fixing that, the next step fails as it was failing before:

              
          
 PYTHON:
   bucket = client.bucket('fcb','futbolistas')
   results = bucket.search('name_s:Lion*')
   print results
              
          
 Traceback (most recent call last):
   File x.py, line 13, in module
 results = bucket.search('name_s:Lion*')
   File /usr/local/lib/python2.7/dist-packages/riak/bucket.py, line
 420, in search
 return self._client.fulltext_search(self.name, query, **params)
   File
 /usr/local/lib/python2.7/dist-packages/riak/client/transport.py, line
 184, in wrapper
 return self._with_retries(pool, thunk)
   File
 /usr/local/lib/python2.7/dist-packages/riak/client/transport.py, line
 126, in _with_retries
 return fn(transport)
   File
 /usr/local/lib/python2.7/dist-packages/riak/client/transport.py, line
 182, in thunk
 return fn(self, transport, *args, **kwargs)
   File
 /usr/local/lib/python2.7/dist-packages/riak/client/operations.py, line
 573, in fulltext_search
 return transport.search(index, query, **params)
   File
 /usr/local/lib/python2.7/dist-packages/riak/transports/pbc/transport.py,
 line 564, in search
 MSG_CODE_SEARCH_QUERY_RESP)
   File
 /usr/local/lib/python2.7/dist-packages/riak/transports/pbc/connection.py,
 line 50, in _request
 return self._recv_msg(expect)
   File
 /usr/local/lib/python2.7/dist-packages/riak/transports/pbc/connection.py,
 line 142, in _recv_msg
 raise RiakError(err.errmsg)
 riak.RiakError: 'No index fcb found.'

 Again it says fcb index not found... and this time I fully followed
 the right documentation and didn't use bucket.enable_search()

 Thanks,
 Alex


 On Mon, Aug 18, 2014 at 10:49 PM, Alex De la rosa 
 alex.rosa@gmail.com wrote:

 Hi Eric,

 I'm sorry but I followed the documentation that you provided me and
 still raises issues:

              
          
 STEP 1: Create Index: famoso
              
          
 PYTHON:
   client.create_search_index('famoso')

              
          
 STEP 2: Create Bucket Type: futbolistas
              
          
 SHELL:
   riak-admin bucket-type create futbolistas
 '{props:{search_index:famoso}}'

Re: Riak Search Issue

2014-08-19 Thread Alex De la rosa
Hi Sean,

Yeah, I opted to follow that pattern on my latest attempt as I see it more
clear that the way in the documentation. Still same issue although with
Eric we saw it works fine when index and bucket has the same name.

Thanks!
Alex


On Mon, Aug 18, 2014 at 11:27 PM, Sean Cribbs s...@basho.com wrote:

 Don't use bucket with 2 arguments, use
 client.bucket_type('futbolistas').bucket('fcb'). This makes your
 intent more clear. The 2-arity version of bucket() was for
 backwards-compatibility.

 On Mon, Aug 18, 2014 at 4:10 PM, Eric Redmond eredm...@basho.com wrote:
  Alex,
 
  You may have discovered a legitimate bug in the python driver. In the
  meantime, if you give your bucket and index the same name, you can
 proceed,
  while we investigate.
 
  Thanks,
  Eric
 
 
  On Aug 18, 2014, at 2:00 PM, Alex De la rosa alex.rosa@gmail.com
  wrote:
 
  Yes, I did it in purpose, because I did so many testings that I wanted to
  start fresh... so I kinda translated the documentation, but that is
  irrelevant to the case.
 
  Thanks,
  Alex
 
 
  On Mon, Aug 18, 2014 at 10:59 PM, Eric Redmond eredm...@basho.com
 wrote:
 
  Your steps seemed to have named the index famoso.
 
  Eric
 
 
  On Aug 18, 2014, at 1:56 PM, Alex De la rosa alex.rosa@gmail.com
  wrote:
 
  Ok, I found the first error in the documentation, parameters are in
  reverse order:
 
  bucket = client.bucket('animals', 'cats')
 
  should be:
 
  bucket = client.bucket('cats', 'animals')
 
  Now I could save and it found the bucket type: bucket =
  client.bucket('fcb','futbolistas') VS bucket =
 client.bucket('futbolistas',
  'fcb')
 
  However, even fixing that, the next step fails as it was failing before:
 
               
 
          
  PYTHON:
bucket = client.bucket('fcb','futbolistas')
results = bucket.search('name_s:Lion*')
print results
               
 
          
  Traceback (most recent call last):
File x.py, line 13, in module
  results = bucket.search('name_s:Lion*')
File /usr/local/lib/python2.7/dist-packages/riak/bucket.py, line
 420,
  in search
  return self._client.fulltext_search(self.name, query, **params)
File
 /usr/local/lib/python2.7/dist-packages/riak/client/transport.py,
  line 184, in wrapper
  return self._with_retries(pool, thunk)
File
 /usr/local/lib/python2.7/dist-packages/riak/client/transport.py,
  line 126, in _with_retries
  return fn(transport)
File
 /usr/local/lib/python2.7/dist-packages/riak/client/transport.py,
  line 182, in thunk
  return fn(self, transport, *args, **kwargs)
File
 /usr/local/lib/python2.7/dist-packages/riak/client/operations.py,
  line 573, in fulltext_search
  return transport.search(index, query, **params)
File
 
 /usr/local/lib/python2.7/dist-packages/riak/transports/pbc/transport.py,
  line 564, in search
  MSG_CODE_SEARCH_QUERY_RESP)
File
 
 /usr/local/lib/python2.7/dist-packages/riak/transports/pbc/connection.py,
  line 50, in _request
  return self._recv_msg(expect)
File
 
 /usr/local/lib/python2.7/dist-packages/riak/transports/pbc/connection.py,
  line 142, in _recv_msg
  raise RiakError(err.errmsg)
  riak.RiakError: 'No index fcb found.'
 
  Again it says fcb index not found... and this time I fully followed
 the
  right documentation and didn't use bucket.enable_search()
 
  Thanks,
  Alex
 
 
  On Mon, Aug 18, 2014 at 10:49 PM, Alex De la rosa
  alex.rosa@gmail.com wrote:
 
  Hi Eric,
 
  I'm sorry but I followed the documentation that you provided me and
 still
  raises issues:
 
               
           
  STEP 1: Create Index: famoso
               
           
  PYTHON:
client.create_search_index('famoso')
 
               
           
  STEP 2: Create Bucket Type: futbolistas
               
           
  SHELL:
riak-admin bucket-type create futbolistas
  '{props:{search_index:famoso}}'
= futbolistas created
riak-admin bucket-type activate futbolistas
= futbolistas has been activated
 
               
           
  STEP 3: Create Bucket and Add data: fcb
               
           
  PYTHON:
bucket = client.bucket('futbolistas', 'fcb')
c = bucket.new('lionel', {'name_s': 'Lionel', 'age_i': 30,
 

RE: Riak python client and Solr

2014-08-19 Thread Sapre, Meghna A
Thanks Eric,
  It throws errors for 'group.field'='build.version'.

search_results = riak_client.fulltext_search(self.Result_Index, 
'build.type:CI', group='on', 'group.field'='build.version')
Cannot appear past keyword arguments.

I tried some variations of the same, did not seem to work
Any suggestions?

Thanks,
Meghna


From: Eric Redmond [mailto:eredm...@basho.com]
Sent: Tuesday, August 19, 2014 1:12 PM
To: Sapre, Meghna A
Cc: riak-users
Subject: Re: Riak python client and Solr

You don't pass in a query as a url encoded string, but rather a set of 
parameters. So you'd call something like:

search_results = riak_client.fulltext_search(self.Result_Index, 
'build.type:CI', group='on', 'group.field'='build.version')

Eric


On Aug 19, 2014, at 1:08 PM, Sapre, Meghna A 
meghna.a.sa...@intel.commailto:meghna.a.sa...@intel.com wrote:


Hi,
  I am trying to use the group and stats options with riak search. I get 
expected results with http urls, but not with python-riak-client fulltext pbc 
search.
Here's what I'm trying to do:

q = 'build.type:CIgroup=ongroup.field=build.version'
try:
search_results = 
riak_client.fulltext_search(self.Result_Index, q)
   except Exception as e:
print e
log.exception(e)

This throws an error: no field name specified in query and no default specified 
via 'df' param.

The same query string works without the group options, and the complete string 
works in the http API.
Any suggestions on how to make this work?

Thanks,
Meghna

___
riak-users mailing list
riak-users@lists.basho.commailto:riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Counters inside Maps

2014-08-19 Thread Alex De la rosa
Imagine I have a Riak object footballer with some static fields: name,
team, number. I store them like this now:

1: CREATE INDEX FOR RIAK SEARCH
curl -XPUT http://148.251.140.229:8098/search/index/ix_footballers;

2: CREATE BUCKET TYPE
riak-admin bucket-type create tp_footballers
'{props:{allow_mult:false,search_index:ix_footballers}}'
riak-admin bucket-type activate tp_footballers

3: INSERT A PLAYER
bucket = client.bucket_type('tp_footballers').bucket('footballers')
key = bucket.new('lionelmessi', data={'name_s':'Messi',
'team_s':'Barcelona', 'number_i':10}, content_type='application/json')
key.store()

4: SEARCH FOR BARCELONA PLAYERS
r = client.fulltext_search('ix_footballers', 'team_s:Barcelona')

So far so good :) BUT... what if I want to have a field goals_i that is a
counter that will be incremented each match day with the number of goals he
scored? What is the syntax/steps to do to set up footballers as a MAP and
then put a COUNTER inside? I know is possible as I read it in some data
dump some Basho employee passed me some time ago, but I can't manage to see
how to do it now.

Thanks!
Alex
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Counters inside Maps

2014-08-19 Thread Sean Cribbs
Alex,

Assuming you've already made your bucket-type with map as the
datatype, then bucket.new() will return you a Map instead of a
RiakObject. Translating your example above:

key = bucket.new('lionelmessi')
key.registers['name'].assign('Messi')
key.registers['team'].assign('Barcelona')
key.counters['number'].increment(10)
key.store()

Note that because Maps are based on mutation operations and not
replacing the value with new ones, you can later do this without
setting the entire value:

key.counters['number'].increment(1)
key.store()

This will also change your searches, however, in that the fields will
be suffixed with the embedded type you are using:

r = client.fulltext_search('ix_footballers', 'team_register:Barcelona')

Hope that helps!

On Tue, Aug 19, 2014 at 2:59 PM, Alex De la rosa
alex.rosa@gmail.com wrote:
 Imagine I have a Riak object footballer with some static fields: name,
 team, number. I store them like this now:

 1: CREATE INDEX FOR RIAK SEARCH
 curl -XPUT http://148.251.140.229:8098/search/index/ix_footballers;

 2: CREATE BUCKET TYPE
 riak-admin bucket-type create tp_footballers
 '{props:{allow_mult:false,search_index:ix_footballers}}'
 riak-admin bucket-type activate tp_footballers

 3: INSERT A PLAYER
 bucket = client.bucket_type('tp_footballers').bucket('footballers')
 key = bucket.new('lionelmessi', data={'name_s':'Messi',
 'team_s':'Barcelona', 'number_i':10}, content_type='application/json')
 key.store()

 4: SEARCH FOR BARCELONA PLAYERS
 r = client.fulltext_search('ix_footballers', 'team_s:Barcelona')

 So far so good :) BUT... what if I want to have a field goals_i that is a
 counter that will be incremented each match day with the number of goals he
 scored? What is the syntax/steps to do to set up footballers as a MAP and
 then put a COUNTER inside? I know is possible as I read it in some data dump
 some Basho employee passed me some time ago, but I can't manage to see how
 to do it now.

 Thanks!
 Alex

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




-- 
Sean Cribbs s...@basho.com
Software Engineer
Basho Technologies, Inc.
http://basho.com/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Counters inside Maps

2014-08-19 Thread Alex De la rosa
Cool! Understood :)

Thanks!
Alex

On Wednesday, August 20, 2014, Sean Cribbs s...@basho.com wrote:

 On Tue, Aug 19, 2014 at 3:34 PM, Alex De la rosa
 alex.rosa@gmail.com javascript:; wrote:
  Hi Sean,
 
  I didn't created the bucket_type as a map datatype as at first i was just
  testing simple Riak Search... then it occurred to me what if I want a
  counter in the data? :)
 
  Your example is pretty straightforward to follow and simple. Just 2
  questions:
 
  1. key.counters['number'].increment(1) = No need to define a counters
  data-type somewhere before putting it inside the map as we normally need
 in
  simple buckets? If it works automatically is great :)

 Yes, it works automatically. All included datatypes are available inside
 maps.

 
  2. if we use number_counter instead of number_i does Search/SOLR
  understand is an integer? in case you want to do a range... as somewhere
 in
  the docs I read that better to use _s for strings, _b for binary,
 _i
  for integers, etc... so SOLR knows how to treat the data... I believe
 there
  will be no strange behaviours for having _register instead of _s and
  _counter instead of _i, right?

 The default Solr schema that ships with Riak accounts for these
 datatypes automatically and uses the appropriate index field type:

 https://github.com/basho/yokozuna/blob/develop/priv/default_schema.xml#L96-L104

 If you write your own schema, you will want to include or change the
 schema fields appropriately.

 
  Thanks!
  Alex
 
 
  On Wed, Aug 20, 2014 at 12:24 AM, Sean Cribbs s...@basho.com
 javascript:; wrote:
 
  Alex,
 
  Assuming you've already made your bucket-type with map as the
  datatype, then bucket.new() will return you a Map instead of a
  RiakObject. Translating your example above:
 
  key = bucket.new('lionelmessi')
  key.registers['name'].assign('Messi')
  key.registers['team'].assign('Barcelona')
  key.counters['number'].increment(10)
  key.store()
 
  Note that because Maps are based on mutation operations and not
  replacing the value with new ones, you can later do this without
  setting the entire value:
 
  key.counters['number'].increment(1)
  key.store()
 
  This will also change your searches, however, in that the fields will
  be suffixed with the embedded type you are using:
 
  r = client.fulltext_search('ix_footballers', 'team_register:Barcelona')
 
  Hope that helps!
 
  On Tue, Aug 19, 2014 at 2:59 PM, Alex De la rosa
  alex.rosa@gmail.com javascript:; wrote:
   Imagine I have a Riak object footballer with some static fields:
 name,
   team, number. I store them like this now:
  
   1: CREATE INDEX FOR RIAK SEARCH
   curl -XPUT http://148.251.140.229:8098/search/index/ix_footballers;
  
   2: CREATE BUCKET TYPE
   riak-admin bucket-type create tp_footballers
   '{props:{allow_mult:false,search_index:ix_footballers}}'
   riak-admin bucket-type activate tp_footballers
  
   3: INSERT A PLAYER
   bucket = client.bucket_type('tp_footballers').bucket('footballers')
   key = bucket.new('lionelmessi', data={'name_s':'Messi',
   'team_s':'Barcelona', 'number_i':10}, content_type='application/json')
   key.store()
  
   4: SEARCH FOR BARCELONA PLAYERS
   r = client.fulltext_search('ix_footballers', 'team_s:Barcelona')
  
   So far so good :) BUT... what if I want to have a field goals_i that
   is a
   counter that will be incremented each match day with the number of
 goals
   he
   scored? What is the syntax/steps to do to set up footballers as a
 MAP
   and
   then put a COUNTER inside? I know is possible as I read it in some
 data
   dump
   some Basho employee passed me some time ago, but I can't manage to see
   how
   to do it now.
  
   Thanks!
   Alex
  
   ___
   riak-users mailing list
   riak-users@lists.basho.com javascript:;
   http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
  
 
 
 
  --
  Sean Cribbs s...@basho.com javascript:;
  Software Engineer
  Basho Technologies, Inc.
  http://basho.com/
 
 



 --
 Sean Cribbs s...@basho.com javascript:;
 Software Engineer
 Basho Technologies, Inc.
 http://basho.com/

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Is it a good practice to make riak a service and automatically start when the machine starts?

2014-08-19 Thread Gavin Huang
Hi,
We have a little uncertainty in our team about whether to have riak
automatically start when machine get rebooted.
It do bring us some convenient if riak can start by default when
machine crashed for some reason, and automatically restart. but i was
wondering is there any case that automatically starting a problematic node
and join the cluster would cause some problem. do you guys have any idea?

Thanks.
Gavin
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Is it a good practice to make riak a service and automatically start when the machine starts?

2014-08-19 Thread Jared Morrow
Gavin,

I think if you monitor the crash and reboot and take note or flag if it
happens often, then that could be when you investigate the node more in
depth.  Having a node go up and down often is a sign clearly of something
bad happening that should be investigated.  For a rare reboot/crash, having
it start on boot and automatically come up seems like the more ops friendly
way to treat some event that should be rare.  Due to Riak working without
all its nodes up, we've had people who forgot to start Riak nodes and never
noticed they were down for weeks.  This is good in that Riak can take it,
but not very awesome when you do bring it up and the node has a lot of
handoff work to do to catch up.

-Jared


On Tue, Aug 19, 2014 at 9:16 PM, Gavin Huang shuminghu...@gmail.com wrote:

 Hi,
 We have a little uncertainty in our team about whether to have riak
 automatically start when machine get rebooted.
 It do bring us some convenient if riak can start by default when
 machine crashed for some reason, and automatically restart. but i was
 wondering is there any case that automatically starting a problematic node
 and join the cluster would cause some problem. do you guys have any idea?

 Thanks.
 Gavin


 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Is it a good practice to make riak a service and automatically start when the machine starts?

2014-08-19 Thread Gavin Huang
thanks for the quick reply, it make sense for me.


On Wed, Aug 20, 2014 at 1:07 PM, Jared Morrow ja...@basho.com wrote:

 Gavin,

 I think if you monitor the crash and reboot and take note or flag if it
 happens often, then that could be when you investigate the node more in
 depth.  Having a node go up and down often is a sign clearly of something
 bad happening that should be investigated.  For a rare reboot/crash, having
 it start on boot and automatically come up seems like the more ops friendly
 way to treat some event that should be rare.  Due to Riak working without
 all its nodes up, we've had people who forgot to start Riak nodes and never
 noticed they were down for weeks.  This is good in that Riak can take it,
 but not very awesome when you do bring it up and the node has a lot of
 handoff work to do to catch up.

 -Jared


 On Tue, Aug 19, 2014 at 9:16 PM, Gavin Huang shuminghu...@gmail.com
 wrote:

 Hi,
 We have a little uncertainty in our team about whether to have riak
 automatically start when machine get rebooted.
 It do bring us some convenient if riak can start by default when
 machine crashed for some reason, and automatically restart. but i was
 wondering is there any case that automatically starting a problematic node
 and join the cluster would cause some problem. do you guys have any idea?

 Thanks.
 Gavin


 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com