Re: Yokozuna kv write timeouts on 1.4 (yz-merge-1.4.0)

2013-07-17 Thread Eric Redmond
Dave,

Your initial line was correct. Yokozuna is not yet compatible with 1.4.

Eric

On Jul 15, 2013, at 1:00 PM, Dave Martorana  wrote:

> Hi everyone. First post, if I leave anything out just let me know.
> 
> I have been using vagrant in testing Yokozuna with 1.3.0 (the official 0.7.0 
> “release") and it runs swimmingly. When 1.4 was released and someone pointed 
> me to the YZ integration branch, I decided to give it a go.
> 
> I realize that YZ probably doesn’t support 1.4 yet, but here are my 
> experiences.
> 
> - Installs fine
> - Using default stagedevrel with 5 node setup
> - Without yz enabled in app.config, kv accepts writes and reads
> - With yz enabled on dev1 and nowhere else, kv accepts writes and reads, 
> creates yz index, associates index with bucket, does not index content
> - With yz enabled on 4/5 nodes, kv stops accepting writes (timeout)
> 
> Ex:
> 
> (env)➜  curl -v -H 'content-type: text/plain' -XPUT 
> 'http://localhost:10018/buckets/players/keys/name' -d "Ryan Zezeski"
> * Adding handle: conn: 0x7f995a804000
> * Adding handle: send: 0
> * Adding handle: recv: 0
> * Curl_addHandleToPipeline: length: 1
> * - Conn 0 (0x7f995a804000) send_pipe: 1, recv_pipe: 0
> * About to connect() to localhost port 10018 (#0)
> *   Trying 127.0.0.1...
> * Connected to localhost (127.0.0.1) port 10018 (#0)
> > PUT /buckets/players/keys/name HTTP/1.1
> > User-Agent: curl/7.30.0
> > Host: localhost:10018
> > Accept: */*
> > content-type: text/plain
> > Content-Length: 12
> > 
> * upload completely sent off: 12 out of 12 bytes
> < HTTP/1.1 503 Service Unavailable
> < Vary: Accept-Encoding
> * Server MochiWeb/1.1 WebMachine/1.9.2 (someone had painted it blue) is not 
> blacklisted
> < Server: MochiWeb/1.1 WebMachine/1.9.2 (someone had painted it blue)
> < Date: Mon, 15 Jul 2013 19:54:50 GMT
> < Content-Type: text/plain
> < Content-Length: 18
> < 
> request timed out
> * Connection #0 to host localhost left intact
> 
> Here are my Vagrant file:
> 
> https://gist.github.com/themartorana/460a52bb3f840010ecde
> 
> and build script for the server:
> 
> https://gist.github.com/themartorana/e2e0126c01b8ef01cc53
> 
> Hope this helps.
> 
> Dave
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Data population of Yokozuna on key-path in schema?

2013-07-17 Thread Eric Redmond
That's correct. The XML extractor nests by element name, separating elements by 
an underscore.

Eric

On Jul 17, 2013, at 12:46 PM, Dave Martorana  wrote:

> Hi,
> 
> I realize I may be way off-base, but I noticed the following slide in Ryan’s 
> recent Ricon East talk on Yokozuna:
> 
> http://cl.ly/image/3s1b1v2w2x12
> 
> Does the schema pick out values based on key-path automatically? For 
> instance, 
> 
> val... 
> 
> automatically gets mapped to the “commit_repo" field definition for the 
> schema?
> 
> Thanks!
> 
> Dave
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak in Baltimore Thursday and Friday

2013-07-17 Thread Mark Phillips
Hey All,

I'll be in Baltimore tomorrow for the Riak Meetup happening at the OmniTI
offices [1]. If you're in the area, you should attend.

I'm also going to have a few hours of free time tomorrow and Friday if
anyone wants to get together and chat Riak.  Let me know if you're
interested.

Best,

Mark
twitter.com/pharkmillups

[1] http://www.meetup.com/Baltimore-Riak-Meetup/events/123958722/
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Quickly deleting + recreating item in Riak deletes new item

2013-07-17 Thread Matthew Dawson
On July 17, 2013 08:45:01 AM Kelly McLaughlin wrote:
> Matthew,
> 
> I find it really surprising that you don't see any difference in behavior
> when you set delete_mode to keep. I think it would be helpful if you could
> outline your specific setup and give the steps to reproduce what you're
> seeing to be able to make a determination if this represents a bug or not.
> Thanks.
> 
> Kelly
Hi Kelly,

Sure, no problem.  Hardware wise, I have:
 - An AMD Phenom II X6 Desktop with 16G memory, and a HDD with an SSD cache.
 - An Intel Ivy Bridge Dual Core (+HT) Laptop with 16G memory and SSD.
Both have lots of free memory + disk space for running my tests, and my 
Desktop never seems to be IO bound.  Both machines are connected over Ethernet 
on the same LAN.

On top of that hardware, both are running two instances of Riak each, all 
forming one 4 node cluster.  I'm using the default ring size of 64.  I've also 
upgraded all the nodes to the latest release, 1.4, using the 1.4 tag from Git.  
I'm not using this to seriously benchmark Riak, so I don't think this setup 
should cause any issues.  I'm also going to setup a really cluster for 
production use, so ring size is not a concern.
Each Riak instance uses LevelDB as the datastore, Riak Search is disabled.  
I'm using Riak's PB API for access, and I've bumped up the backlog parameter 
to 1024 for now.  Originally my program would connect to a single node, but 
recently I've been playing with HAProxy locally, and now I use that to connect 
to all four instances.  The problem existed before I implemented HAProxy.  
Riak Control is also enabled on one node per computer.

For my application, it effectively stores in Riak two pieces of information.  
First it stores a list of keys associated with an object, and then stores an 
individual item at each key.  I limit the number of keys to 1 per object.

For my test suite, I automatically clean up after each test by listing all the 
keys associated with a bucket, and then delete each key individually.  I only 
store items in two buckets, so this cleans the slate before each run.

The test that has the high chance of failing is testing how the system deals 
with inserting 1 items against one object.  The key list remains below 1M.  
Occasionally I see other tests fail, but I think this one fails more often as 
it stresses the entire system the most.  If I stop the automatic cleanup, the 
not found key is also not findable by Curl either.

Before posting, I would delete and insert keys, without using a vclock.  I had 
figured this was safe as I ran with allow_mult=true on both buckets, and I 
implemented conflict resolution first.  As suggested on this list, I now have 
the 1 item test suite use vclocks from start to finish.  However, I still 
see this behaviour.

I've attached a program (written in go as that is what I'm using) to this 
email which triggers the behaviour.  As far as I understand Riak, it is 
properly fetching vclocks whenever possible.  The library I'm using (located 
at: github.com/tpjg/goriakpbc ) was just recently updated to ensure that 
vclocks are fetched, even if the item is deleted.  I am using an up to date 
version of the library.  The program is acts similarly to my app, but paired 
down as far as possible.  Note that this behaviour is unpredictable, and this 
program will sometimes execute fine.
I only tested this program against the default delete_mode setting.  Also, 
using HAProxy seems to trigger the issue far more readily, but it happens fine 
without it.


If there is any other information I can provide to help, let me know.

Thanks,
-- 
Matthewpackage main

import riak "github.com/tpjg/goriakpbc"
import "fmt"
import "strconv"
import "sync"

func setupBucket(cli *riak.Client, bucketName string) error {
	bucket, err := cli.NewBucket(bucketName)
	if err != nil {
		return err
	}
	err = bucket.SetAllowMult(true)
	if err != nil {
		return err
	}
	return nil
}

const do_keys = 1

func main() {
	// Connect + setup bucket
	con := riak.NewClientPool("localhost:9000", 100)
	err := con.Connect()
	if err != nil {
		panic(err)
	}
	bucket, err := con.NewBucket("test_bucket_no_one_has")
	if err != nil {
		panic(err)
	}
	err = bucket.SetAllowMult(true)
	if err != nil {
		panic(err)
	}
	
	// Ok, first insert 1 items.
	wg := sync.WaitGroup{}
	wg.Add(do_keys)
	for i := 0; i < do_keys; i++ {
		go func(i int) {
			defer wg.Done()
			item, err := bucket.Get(strconv.Itoa(i))
			if item == nil {
panic(err)
			}
			item.Data = []byte("ASDF")
			err = item.Store()
			if err != nil {
panic(err)
			}
		}(i)
	}
	wg.Wait()
	fmt.Println("Done insert")
	
	// Verify items exist
	wg.Add(do_keys)
	for i := 0; i < do_keys; i++ {
		go func(i int) {
			defer wg.Done()
			_, err := bucket.Get(strconv.Itoa(i))
			if err != nil {
fmt.Printf("Failed to fetch item %v err %s\n", i, err)
			}
		}(i)
	}
	wg.Wait()
	fmt.Println("Done fetch")
	
	// And Delete
	keys, err := bucket.ListKeys()
	if err != ni

Riak Search and Sorting

2013-07-17 Thread Jeremiah Peschka
I'm attempting to sort data with Riak Search and have run into a distinct
lack of sorting.

When using curl (The Fullest Featurest Riak Client EVAR™), I query the
following URL:
http://localhost:10038/solr/posts/select?q=title_txt:google&presort=key&sort=creation_dt&rows=500

Being aware that results are sorted AFTER filtering on the server side, I
adjusted my query to accept too many rows: there are 335 rows that meet my
query criteria. However, Riak Search returns 10 sorted by some random
criteria that I'm not aware of (it's not score, that's for sure).

Is this behavior expected? Is there something that I've missed in my query?

---
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Data population of Yokozuna on key-path in schema?

2013-07-17 Thread Dave Martorana
Hi,

I realize I may be way off-base, but I noticed the following slide in
Ryan’s recent Ricon East talk on Yokozuna:

http://cl.ly/image/3s1b1v2w2x12

Does the schema pick out values based on key-path automatically? For
instance,

val...

automatically gets mapped to the “commit_repo" field definition for the
schema?

Thanks!

Dave
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: /usr/lib/riak/erts-5.9.1/bin/epmd -daemon

2013-07-17 Thread Patrick Durusau
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Andrew,

Thanks!

Patrick

On 07/17/2013 11:25 AM, Andrew Thompson wrote:
> On Wed, Jul 17, 2013 at 11:23:03AM -0400, Patrick Durusau wrote:
>> -BEGIN PGP SIGNED MESSAGE- Hash: SHA1
>> 
>> Greetings!
>> 
>> I am at the start of configuring a multi-node development
>> environment with Riak 1.4 on Ubuntu 12.04. Riak was installed
>> using apt-get.
>> 
>> I stopped the one node test of Riak with sudo riak stop but ps
>> -ef | grep erlang shows:
>> 
>> /usr/lib/riak/erts-5.9.1/bin/epdm -daemon (owner of the process
>> is riak)
>> 
>> after I ran the stop command.
>> 
>> Is this expected behaviour?
>> 
>> Should I use kill to stop the daemon?
>> 
> 
> No, you can leave epmd alone, it is started the first time a
> distibuted erlang node is started, but will survive that node's
> death. It is used to advertise what ports distributed erlang is
> using on a machine, so erlang nodes can communicate.
> 
> Andrew
> 
> ___ riak-users mailing
> list riak-users@lists.basho.com 
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 

- -- 
Patrick Durusau
patr...@durusau.net
Technical Advisory Board, OASIS (TAB)
Former Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJR5r7lAAoJEAudyeI2QFGoKg4P/AyM5vjKIs1wgHQ8sgUnu8Hl
wL+4ChshZY2iwFNVyD6MVy6U+Ze8PW+vvGtuaxhDOxblPCkRsbHo/aW7wbadjE6x
2QhHMEe/ft3nDCCNRwpgfA6DgbPIPsBxEvV4TJwjFr20Py+bjSaXPRvzmQiDAIKC
Ai9dXA3x9XYsdkFIhj2Vp9aMH1HTTRoypHLuq/XA2ffbJPqsuKE/3loJGfen0W9J
vmgxKf2alg32mADMYbyaMKU03LctjTifGaGXJfxUWrAXE0ff8B3G5xqWhGyp1Fnw
hNzWoU1Rp/pEhoKBJ4JAeJNmpQfCBEmeZs6Hja6il2CuoOpqalOw+qnisKf4wFSC
sNXRnNTQHAwxRAdR0zh0feicfD7tSD4kLBsH80ksKcrW9VDduBPLiiKZHc06sXRJ
WztnmVFhdk2wGig8OTpkgxIgutBcpqbWcGuEPjKrJ+aXOE17Vnola08NH7/glv4t
omPw5xyhJgXcZG56LpR4vcwU1JpsXApXtodXvaQRhS7iUqgxETfKJQu1WaY30J7e
3MhrTq8AAcCufHUavKDylw8AX8KUu1PIoLR3205SXaE+ytjmOhJl+MDzRGtUAAkt
a8q1I95mvXx6NUQHi7mEAFdfaYrMT9cdwndD7NPNV/LnbpAU5z5w5LTF4WsFT93h
FB5ZeWvxK2//33qN6Kn6
=nTMl
-END PGP SIGNATURE-

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


/usr/lib/riak/erts-5.9.1/bin/epmd -daemon

2013-07-17 Thread Patrick Durusau
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Greetings!

I am at the start of configuring a multi-node development environment
with Riak 1.4 on Ubuntu 12.04. Riak was installed using apt-get.

I stopped the one node test of Riak with sudo riak stop but
ps -ef | grep erlang shows:

/usr/lib/riak/erts-5.9.1/bin/epdm -daemon (owner of the process is riak)

after I ran the stop command.

Is this expected behaviour?

Should I use kill to stop the daemon?

Thanks!

Hope everyone is having a great day!

Patrick

- -- 
Patrick Durusau
patr...@durusau.net
Technical Advisory Board, OASIS (TAB)
Former Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJR5rbOAAoJEAudyeI2QFGorwwP/3Kewu0ietbZMO8mXq5NQ69T
3ciLCPY/bk7TGFUfb/VI7LfI3Dn0qMCXqlSElSwwQ3ffyCqa3kPd+Q7+i5qovhxe
Tie3do+RFYcOFh2EL62B2cVdmON6q1TnnUzrzg/+OwbFQ2BKbAuSIOGJut3Jk2vr
FeJe/a6G7OAmcDy9cV7EGg9XDo/JPpYJfsIYadlbIMfvx/eTs/KA/fywIHaADLRr
7Gvc3fE0gX8LJ4KHHhwd/ZCnKSZdGdEEX5BmXUqciGRv1cjkbd71yuEw4n90l2ZW
FUfZw0SH3gjIYCFXrHhEf3wv6zcjPwd0llErbXflreJ+KzYZ8SKzjimOt+wcvJTR
Jn+ro4sd6dsz8uMsXQRWBAypTgteUgavMUiN09k826JS4mOwh8puG3FLkA9akJmB
CjfCKMJgwR2HlMMmQZziezDUrIPZwdO4hQO2IvoWTN6tYFeHzDM+KaJr67Nx1VBS
4/ZIl3iiEr1JKqo7qrdpGTppP21s14g/Eqtkwkdn6gaVotNtsNjKTF06Hsj1P133
UdH4z/waRgQzal+nZVfXAGv0/KgPi06x03f2qpGB2ul1GOLSevllI/wlotw37X1E
Vi+rw6N2i11WCq4DWQkXVNJXzj3og1luubkMdFQb3PD7lOy+McJ+9pV4xW6fAUIN
YcTAP5n8H26HxH+fOLTr
=QG/x
-END PGP SIGNATURE-

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: /usr/lib/riak/erts-5.9.1/bin/epmd -daemon

2013-07-17 Thread Andrew Thompson
On Wed, Jul 17, 2013 at 11:23:03AM -0400, Patrick Durusau wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Greetings!
> 
> I am at the start of configuring a multi-node development environment
> with Riak 1.4 on Ubuntu 12.04. Riak was installed using apt-get.
> 
> I stopped the one node test of Riak with sudo riak stop but
> ps -ef | grep erlang shows:
> 
> /usr/lib/riak/erts-5.9.1/bin/epdm -daemon (owner of the process is riak)
> 
> after I ran the stop command.
> 
> Is this expected behaviour?
> 
> Should I use kill to stop the daemon?
> 

No, you can leave epmd alone, it is started the first time a distibuted
erlang node is started, but will survive that node's death. It is used
to advertise what ports distributed erlang is using on a machine, so
erlang nodes can communicate.

Andrew

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Quickly deleting + recreating item in Riak deletes new item

2013-07-17 Thread Kelly McLaughlin
Matthew,

I find it really surprising that you don't see any difference in behavior
when you set delete_mode to keep. I think it would be helpful if you could
outline your specific setup and give the steps to reproduce what you're
seeing to be able to make a determination if this represents a bug or not.
Thanks.

Kelly


On Tue, Jul 16, 2013 at 11:33 PM, Matthew Dawson wrote:

> On July 16, 2013 12:00:36 PM Gabriel Littman wrote:
> > Please correct me if I'm wrong, but I think the right answer is doing a
> GET
> > first so you have a vector clock that is after the delete.  Then you
> should
> > be able to be sure your new write wins in any sibling resolution.
> >
> > Gabe
> >
> I would assume this to be the case.  I've even made sure my delete requests
> all have vclocks attached (by doing a GET first).  However I still see
> missing
> keys every so often.
>
> By making each key unique across runs of the test suite seems to solve
> things,
> but I don't like that as a solution.  And I'm not sure if this is an actual
> bug or not.  If this behaviour is considered a bug, I can try making a
> minimal
> reproducible test case and file a bug using that.
> --
> Matthew
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Add CentOs riak node to Existing RHEL node cluster

2013-07-17 Thread Seth Thomas
1. There should be no issue adding the same release version of node between 
operating systems. That said any performance differences between kernels and 
hardware could make the latencies a bit more unpredictable.

2. This has come up before so I'll link to the excellent response by Jon 
Meredith: 
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2012-August/009133.html

We're currently on 1.4 though so you may want to step through the 1.3 and then 
1.4 upgrades additionally. Post 1.2 we have capability negotiation so one can 
run mixed clusters. 

-- 
Seth Thomas
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Wednesday, July 17, 2013 at 1:59 PM, raghwani sohil wrote:

> Hi ,
> 
> 1> 
> We have three node riak cluster ( 0.14.2 ) on RHEL 5.5 on production server. 
> 
> We have one riak node(0.14.2) on  Cent OS 6.4  on production server. 
> 
> Is it possible to add this one node(Cent OS) to existing three node 
> cluster(RHEL 5.5).
> 
> 2> Also What are the  steps to upgrade riak 0.14.2  to 1.2 ?  
> 
> thanks,
> Sohil 
> 
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com (mailto:riak-users@lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Lots of sparse columns. Efficient like Cassandra? Some measures of my dataset

2013-07-17 Thread gbrits
Each key-column value actually already is a rollup of a sparse matrix
(which is why the uncompressed key-column values are always exactly the
same length when they exist)
Having just watched that great talk (thanks) it's extremely similar to how
the guys at Boundary are rolling up their data. Validates the approach
which is awesome!

Having just learned from that same talk that when using LevelDB keys don't
have to remain in mem, I'm just going with the logical 
as my new aggregated keys, each having a rolledup sparse matrix as a value.
Hope that made any sense.

Anyway, this feels great!


2013/7/17 Sean Cribbs-2 [via Riak Users] <
ml-node+s197444n4028370...@n3.nabble.com>

> Just to add to Jeremiah's comments, I think you should consider whether
> you will be mostly retrieving:
>
> 1) all 1000 columns
> 2) some subset of columns
> 3) single columns
>
> That will greatly influence how you design your keyspace. Remember, with
> Riak it's just key-value in the end. This is one of my favorite examples of
> building a column-like system on top of pure key-value, Boundary's
> "Kobayashi" system: https://vimeo.com/42902962
>
>
> On Wed, Jul 17, 2013 at 7:25 AM, Jeremiah Peschka <[hidden 
> email]
> > wrote:
>
>>
>>
>> --
>> Jeremiah Peschka - Founder, Brent Ozar Unlimited
>> MCITP: SQL Server 2008, MVP
>> Cloudera Certified Developer for Apache Hadoop
>>
>> On Jul 17, 2013, at 4:38 AM, gbrits <[hidden 
>> email]>
>> wrote:
>>
>> > Somewhere (can't find it now) I've read that Riak, like Cassandra could
>> be
>> > classified as a column store.
>>
>> That is incorrect. Riak is a key value database where the value is an
>> opaque blob.
>>
>> >
>> > This is just a name of course but what I understand from Cassandra is
>> that
>> > this allows for space-efficient encoding of column-values. Basically
>> storage
>> > is surrounded around columns instead of rows, allowing for different
>> > persistence strategies on a per-column, or column-family, basis.
>> Moreover,
>> > it would allow for zero storage overhead for non-existent column values.
>> > I.e: basically allowing for efficient storage of sparse data-sets.
>> >
>> > Does Riak have this property as well?
>>
>> No. Riak will happily store whatever you throw at it. That being said,
>> most good serialization libraries will leave off nullable properties.
>>
>> >
>>
>

> > More specifically, I've got a datastructure on paper with the following
>> > properties, when mapped to riak nomenclature:
>> >
>> > - ~ 1.000.000 keys (will not grow)
>> > - ~ 1.000 columns.  (may grow)
>> > - 1 particular key has a median of ~50 columns. In other words the
>> entire
>> > set is ~ 95% sparse.
>> > - Wherever a key has a value for a particular column, that value is
>> always
>> > exactly a String (base 255) of 4KB length.
>> > - the 4KB values themselves are pretty 'sparse' so would benefit a lot
>> from
>> > run-length encoding. Is this supported out of the box?
>>
>> See above.
>>
>> >
>> > Given these properties how would Riak hold up? Hard to say of course,
>> but
>> > I'm looking for some general advice.
>>
>> Riak objects should be no more than ~10MB for performance reasons. You
>> should be safe.
>>
>> >
>> > Thanks.
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://riak-users.197444.n3.nabble.com/Lots-of-sparse-columns-Efficient-like-Cassandra-Some-measures-of-my-dataset-tp4028367.html
>> > Sent from the Riak Users mailing list archive at Nabble.com.
>> >
>> > ___
>> > riak-users mailing list
>> > [hidden email] 
>>
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>> ___
>> riak-users mailing list
>> [hidden email] 
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
>
>
> --
> Sean Cribbs <[hidden 
> email]
> >
> Software Engineer
> Basho Technologies, Inc.
> http://basho.com/
>
> ___
> riak-users mailing list
> [hidden email] 
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://riak-users.197444.n3.nabble.com/Lots-of-sparse-columns-Efficient-like-Cassandra-Some-measures-of-my-dataset-tp4028367p4028370.html
>  To unsubscribe from Lots of sparse columns. Efficient like Cassandra?
> Some measures of my dataset, click 
> here
> .
> NAML

Add CentOs riak node to Existing RHEL node cluster

2013-07-17 Thread raghwani sohil
Hi ,

1>
We have three node riak cluster ( *0.14.2 *) on* RHEL 5.5 on *production
server.

We have one riak node(*0.14.2*) on  *Cent OS 6.4  on *production server.

Is it possible to add this one node(*Cent OS*) to existing three node
cluster(*RHEL 5.5)*.

2> Also What are the  steps to upgrade r*iak 0.14.2  *to* 1.2 *?* *
*
*
*thanks,*
*Sohil *
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Lots of sparse columns. Efficient like Cassandra? Some measures of my dataset

2013-07-17 Thread Sean Cribbs
Just to add to Jeremiah's comments, I think you should consider whether you
will be mostly retrieving:

1) all 1000 columns
2) some subset of columns
3) single columns

That will greatly influence how you design your keyspace. Remember, with
Riak it's just key-value in the end. This is one of my favorite examples of
building a column-like system on top of pure key-value, Boundary's
"Kobayashi" system: https://vimeo.com/42902962


On Wed, Jul 17, 2013 at 7:25 AM, Jeremiah Peschka <
jeremiah.pesc...@gmail.com> wrote:

>
>
> --
> Jeremiah Peschka - Founder, Brent Ozar Unlimited
> MCITP: SQL Server 2008, MVP
> Cloudera Certified Developer for Apache Hadoop
>
> On Jul 17, 2013, at 4:38 AM, gbrits  wrote:
>
> > Somewhere (can't find it now) I've read that Riak, like Cassandra could
> be
> > classified as a column store.
>
> That is incorrect. Riak is a key value database where the value is an
> opaque blob.
>
> >
> > This is just a name of course but what I understand from Cassandra is
> that
> > this allows for space-efficient encoding of column-values. Basically
> storage
> > is surrounded around columns instead of rows, allowing for different
> > persistence strategies on a per-column, or column-family, basis.
> Moreover,
> > it would allow for zero storage overhead for non-existent column values.
> > I.e: basically allowing for efficient storage of sparse data-sets.
> >
> > Does Riak have this property as well?
>
> No. Riak will happily store whatever you throw at it. That being said,
> most good serialization libraries will leave off nullable properties.
>
> >
> > More specifically, I've got a datastructure on paper with the following
> > properties, when mapped to riak nomenclature:
> >
> > - ~ 1.000.000 keys (will not grow)
> > - ~ 1.000 columns.  (may grow)
> > - 1 particular key has a median of ~50 columns. In other words the entire
> > set is ~ 95% sparse.
> > - Wherever a key has a value for a particular column, that value is
> always
> > exactly a String (base 255) of 4KB length.
> > - the 4KB values themselves are pretty 'sparse' so would benefit a lot
> from
> > run-length encoding. Is this supported out of the box?
>
> See above.
>
> >
> > Given these properties how would Riak hold up? Hard to say of course, but
> > I'm looking for some general advice.
>
> Riak objects should be no more than ~10MB for performance reasons. You
> should be safe.
>
> >
> > Thanks.
> >
> >
> >
> >
> > --
> > View this message in context:
> http://riak-users.197444.n3.nabble.com/Lots-of-sparse-columns-Efficient-like-Cassandra-Some-measures-of-my-dataset-tp4028367.html
> > Sent from the Riak Users mailing list archive at Nabble.com.
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Sean Cribbs 
Software Engineer
Basho Technologies, Inc.
http://basho.com/
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Does Riak support Range Queries over binary safe strings?

2013-07-17 Thread Sean Cribbs
On Wed, Jul 17, 2013 at 5:31 AM, gbrits  wrote:

> Sounds good to me. Just to confirm: the value which I want to get a slice
> from can be a bytearray encoded as a string? (base 255)?
>
>
Bytes are bytes to Riak. Your value is completely opaque.


> Oh and almost forgot, I DO need to be able to parameterize the m/r (with
> param startSliceAt, endSliceAt + some other parameters that manage how
> aggregation of values is performed) .
> Being an absolute noob to m/r is this possible?
>
>
Each phase of a MapReduce job can receive arbitrary additional arguments. I
highly suggest you check out our tutorials on the matter:
http://docs.basho.com/riak/latest/tutorials/querying/MapReduce/


>
>
> 2013/7/17 Alexander Sicular [via Riak Users] <[hidden 
> email]
> >
>
>> To the extent you limit your sliced data via an m/r you will reap those
>> savings on the wire when transferring back to the client
>>
>> You can feed an m/r from riak search, 2i or enumerated keys thereby
>> skipping a costly bucket scan.
>>
>>
>> @siculars
>> http://siculars.posthaven.com
>>
>> Sent from my iRotaryPhone
>>
>> On Jul 16, 2013, at 19:46, Jeremiah Peschka <[hidden 
>> email]>
>> wrote:
>>
>> Not a problem.
>>
>> MapReduce across an entire keyspace is slow.
>>
>> MapReduce when provided with a few bucket/key pairs is the same as a
>> multi-get + processing.
>>
>> You can combine 2i + MR to get quick processing of data. Although, at
>> that point, you might as well just process your data on the client side.
>> Especially if you're just pulling out a slice of bytes.
>>
>> ---
>> Jeremiah Peschka - Founder, Brent Ozar Unlimited
>>  MCITP: SQL Server 2008, MVP
>> Cloudera Certified Developer for Apache Hadoop
>>
>>
>> On Tue, Jul 16, 2013 at 4:13 PM, gbrits <[hidden 
>> email]
>> > wrote:
>>
>>> Wow, high speed on this list!
>>>
>>> I wanted it for near realtime anyway so Map/reduce is out of the
>>> question. Thought somehow it could be done through Riak Search or directly
>>> on secondary indices instead of map/reduce.
>>> Guess not. Oh well, can't have it all.
>>>
>>> Thanks
>>>
>>>
>>> 2013/7/17 Jeremiah Peschka [via Riak Users] <[hidden 
>>> email]
>>> >
>>>
 Following up on Alex's comments -

 If you know which bytes you need to slice, you can store this in a
 secondary index. You can perform range queries across secondary indices (as
 well as keys).

 As long as you're storing your data in a way that allows it to be read
 by either Erlang or JavaScript, you should be able to query over it in
 MapReduce. This is typically regarded as a Bad Idea™ since an MR query will
 need to scan all keys in a bucket (which effectively means scanning the
 entire cluster) and is best done as an infrequent activity to transform
 data.

 ---
 Jeremiah Peschka - Founder, Brent Ozar Unlimited
  MCITP: SQL Server 2008, MVP
 Cloudera Certified Developer for Apache Hadoop


 On Tue, Jul 16, 2013 at 3:45 PM, Alexander Sicular <[hidden 
 email]
 > wrote:

> I would say no. Riak is generally oblivious as to the content of your
> data. Any ranges or other method you would use to query needs to be
> explicitly indexed via riak search or secondary indexes. Once you have
> found your data you could operate over that data in a map reduce, but I
> can't speak to "binary safe" blob operations in either erlang or 
> JavaScript
> although I'm inclined to say yes, you would be able to operate over it in
> m/r.
>
> So searching for keys with certain data in the binblob is probably not
> gonna happen but once you have a key to feed an m/r you could get a slice
> of that value.
>
> Make sense?
> -Alexander
>
> @siculars
> http://siculars.posthaven.com
>
> Sent from my iRotaryPhone
>
> On Jul 16, 2013, at 18:17, gbrits <[hidden 
> email]>
> wrote:
>
> > First, hello all!
> >
> > Coming from Redis, I love that you can just put any binary blob in
> Redis
> > which is just treated as a string. This is possible because Redis
> strings
> > are what they call 'binary safe'. This makes it possible to return
> slices of
> > string-encoded binary data, which is super useful for
> bitset-operations,
> > etc.
> >
> > I'm investigating Riak and I like it a lot so far. Riak seems to
> have range
> > queries (on values, as it seems I must make that distinction with
> > column-stores), but I'm not sure if strings in Riak are "Binary
> safe" in the
> > above sense. If not, is there another way to store binary data

Re: Lots of sparse columns. Efficient like Cassandra? Some measures of my dataset

2013-07-17 Thread Jeremiah Peschka


--
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop

On Jul 17, 2013, at 4:38 AM, gbrits  wrote:

> Somewhere (can't find it now) I've read that Riak, like Cassandra could be
> classified as a column store. 

That is incorrect. Riak is a key value database where the value is an opaque 
blob.

> 
> This is just a name of course but what I understand from Cassandra is that
> this allows for space-efficient encoding of column-values. Basically storage
> is surrounded around columns instead of rows, allowing for different
> persistence strategies on a per-column, or column-family, basis. Moreover,
> it would allow for zero storage overhead for non-existent column values.
> I.e: basically allowing for efficient storage of sparse data-sets.
> 
> Does Riak have this property as well?

No. Riak will happily store whatever you throw at it. That being said, most 
good serialization libraries will leave off nullable properties.

> 
> More specifically, I've got a datastructure on paper with the following
> properties, when mapped to riak nomenclature:
> 
> - ~ 1.000.000 keys (will not grow)
> - ~ 1.000 columns.  (may grow)
> - 1 particular key has a median of ~50 columns. In other words the entire
> set is ~ 95% sparse.
> - Wherever a key has a value for a particular column, that value is always
> exactly a String (base 255) of 4KB length.
> - the 4KB values themselves are pretty 'sparse' so would benefit a lot from
> run-length encoding. Is this supported out of the box?

See above.

> 
> Given these properties how would Riak hold up? Hard to say of course, but
> I'm looking for some general advice. 

Riak objects should be no more than ~10MB for performance reasons. You should 
be safe. 

> 
> Thanks. 
> 
> 
> 
> 
> --
> View this message in context: 
> http://riak-users.197444.n3.nabble.com/Lots-of-sparse-columns-Efficient-like-Cassandra-Some-measures-of-my-dataset-tp4028367.html
> Sent from the Riak Users mailing list archive at Nabble.com.
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Lots of sparse columns. Efficient like Cassandra? Some measures of my dataset

2013-07-17 Thread gbrits
Somewhere (can't find it now) I've read that Riak, like Cassandra could be
classified as a column store. 

This is just a name of course but what I understand from Cassandra is that
this allows for space-efficient encoding of column-values. Basically storage
is surrounded around columns instead of rows, allowing for different
persistence strategies on a per-column, or column-family, basis. Moreover,
it would allow for zero storage overhead for non-existent column values.
I.e: basically allowing for efficient storage of sparse data-sets.

Does Riak have this property as well?

More specifically, I've got a datastructure on paper with the following
properties, when mapped to riak nomenclature:

- ~ 1.000.000 keys (will not grow)
- ~ 1.000 columns.  (may grow)
- 1 particular key has a median of ~50 columns. In other words the entire
set is ~ 95% sparse.
- Wherever a key has a value for a particular column, that value is always
exactly a String (base 255) of 4KB length.
- the 4KB values themselves are pretty 'sparse' so would benefit a lot from
run-length encoding. Is this supported out of the box?

Given these properties how would Riak hold up? Hard to say of course, but
I'm looking for some general advice. 

Thanks. 




--
View this message in context: 
http://riak-users.197444.n3.nabble.com/Lots-of-sparse-columns-Efficient-like-Cassandra-Some-measures-of-my-dataset-tp4028367.html
Sent from the Riak Users mailing list archive at Nabble.com.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Does Riak support Range Queries over binary safe strings?

2013-07-17 Thread gbrits
Sounds good to me. Just to confirm: the value which I want to get a slice
from can be a bytearray encoded as a string? (base 255)?

Oh and almost forgot, I DO need to be able to parameterize the m/r (with
param startSliceAt, endSliceAt + some other parameters that manage how
aggregation of values is performed) .
Being an absolute noob to m/r is this possible?




2013/7/17 Alexander Sicular [via Riak Users] <
ml-node+s197444n4028365...@n3.nabble.com>

> To the extent you limit your sliced data via an m/r you will reap those
> savings on the wire when transferring back to the client
>
> You can feed an m/r from riak search, 2i or enumerated keys thereby
> skipping a costly bucket scan.
>
>
> @siculars
> http://siculars.posthaven.com
>
> Sent from my iRotaryPhone
>
> On Jul 16, 2013, at 19:46, Jeremiah Peschka <[hidden 
> email]>
> wrote:
>
> Not a problem.
>
> MapReduce across an entire keyspace is slow.
>
> MapReduce when provided with a few bucket/key pairs is the same as a
> multi-get + processing.
>
> You can combine 2i + MR to get quick processing of data. Although, at that
> point, you might as well just process your data on the client side.
> Especially if you're just pulling out a slice of bytes.
>
> ---
> Jeremiah Peschka - Founder, Brent Ozar Unlimited
>  MCITP: SQL Server 2008, MVP
> Cloudera Certified Developer for Apache Hadoop
>
>
> On Tue, Jul 16, 2013 at 4:13 PM, gbrits <[hidden 
> email]
> > wrote:
>
>> Wow, high speed on this list!
>>
>> I wanted it for near realtime anyway so Map/reduce is out of the
>> question. Thought somehow it could be done through Riak Search or directly
>> on secondary indices instead of map/reduce.
>> Guess not. Oh well, can't have it all.
>>
>> Thanks
>>
>>
>> 2013/7/17 Jeremiah Peschka [via Riak Users] <[hidden 
>> email]
>> >
>>
>>> Following up on Alex's comments -
>>>
>>> If you know which bytes you need to slice, you can store this in a
>>> secondary index. You can perform range queries across secondary indices (as
>>> well as keys).
>>>
>>> As long as you're storing your data in a way that allows it to be read
>>> by either Erlang or JavaScript, you should be able to query over it in
>>> MapReduce. This is typically regarded as a Bad Idea™ since an MR query will
>>> need to scan all keys in a bucket (which effectively means scanning the
>>> entire cluster) and is best done as an infrequent activity to transform
>>> data.
>>>
>>> ---
>>> Jeremiah Peschka - Founder, Brent Ozar Unlimited
>>>  MCITP: SQL Server 2008, MVP
>>> Cloudera Certified Developer for Apache Hadoop
>>>
>>>
>>> On Tue, Jul 16, 2013 at 3:45 PM, Alexander Sicular <[hidden 
>>> email]
>>> > wrote:
>>>
 I would say no. Riak is generally oblivious as to the content of your
 data. Any ranges or other method you would use to query needs to be
 explicitly indexed via riak search or secondary indexes. Once you have
 found your data you could operate over that data in a map reduce, but I
 can't speak to "binary safe" blob operations in either erlang or JavaScript
 although I'm inclined to say yes, you would be able to operate over it in
 m/r.

 So searching for keys with certain data in the binblob is probably not
 gonna happen but once you have a key to feed an m/r you could get a slice
 of that value.

 Make sense?
 -Alexander

 @siculars
 http://siculars.posthaven.com

 Sent from my iRotaryPhone

 On Jul 16, 2013, at 18:17, gbrits <[hidden 
 email]>
 wrote:

 > First, hello all!
 >
 > Coming from Redis, I love that you can just put any binary blob in
 Redis
 > which is just treated as a string. This is possible because Redis
 strings
 > are what they call 'binary safe'. This makes it possible to return
 slices of
 > string-encoded binary data, which is super useful for
 bitset-operations,
 > etc.
 >
 > I'm investigating Riak and I like it a lot so far. Riak seems to have
 range
 > queries (on values, as it seems I must make that distinction with
 > column-stores), but I'm not sure if strings in Riak are "Binary safe"
 in the
 > above sense. If not, is there another way to store binary data in
 Riak and
 > still do range queries over them quickly?
 >
 > To be exact: I want to do multi-key lookups in Riak, where each
 returned
 > result should be of format: 
 >
 > Thanks,
 > Geert-Jan
 >
 >
 >
 > --
 > View this message in context:
 http://riak-users.197444.n3.nabble.com/Does-Riak-support-Range-Queries-over-binary-safe-strings-tp4028356.html
 > Sent from the Riak Users mailing list archive at Nabble.com.
>