Secondary index reverse sort

2013-07-31 Thread Lucas Cooper
I've seen that the results of secondary index queries are sorted on index
values by default.
I was wondering if there's something I'm missing that would allow me to
fetch those keys but reverse sorted.

I have indexes based on UNIX timestamps and I'd like to grab the most
recent keys.
I'd like this query to be running on demand so I'd like to avoid MapReduce
if at all possible.
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Reuse of Buckets (Java Client)

2013-07-31 Thread Nico Huysamen
Is the Bucket class reusable and thread-safe? I.e. can I create my 
Bucket objects during instantiation of client class, and then reuse the 
same bucket for all operations for the application lifetime? Or should 
buckets be re-created for each request?


Thanks
--
Nico Huysamen
Senior Software Developer | Ad Dynamo


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Reuse of Buckets (Java Client)

2013-07-31 Thread Guido Medina
Yes, it is thread safe, you can treat them as singleton instances per 
bucket, the following order is the kind of the general usage pattern:


 * Fetch bucket.
 * Optional: If exists verify it has your application values (N value, etc)
 * If doesn't exist create it with your settings.
 * Cache it as a singleton instance (You could create a /final
   MapString, Bucket buckets=new HashMap()/) and re-use it in your
   application; assuming your initialization is not lazy, or if it is
   use proper thread safety initialization.

Hope that helps,

Guido.

On 31/07/13 08:45, Nico Huysamen wrote:
Is the Bucket class reusable and thread-safe? I.e. can I create my 
Bucket objects during instantiation of client class, and then reuse 
the same bucket for all operations for the application lifetime? Or 
should buckets be re-created for each request?


Thanks


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: combining Riak (CS) and Spark/shark by speaking over s3 protocol

2013-07-31 Thread Geert-Jan Brits
Dan,

Not sure if I understand the renaming objects-problem in Riak CS. Can you
elaborate?

 I believe smaller object sizes would not be nearly as efficient as
working with plain Riak if only because of the overhead incurred by Riak
CS. Does this mean lack of efficiency in disk storage, in-mem or both?
Moreover I'm having this nagging thought that having to dig through the
manifest to find the blocks will severely impact read latency for
(multi-key) lookups as opposed to the normal bitcask / levelDB lookup.  Is
this correct?

Best,
Geert-Jan


2013/7/30 Dan Kerrigan dan.kerri...@gmail.com

 Geert-Jan -

 We're currently working on a somewhat similar project to integrate Flume
 to ingest data into Riak CS for later processing using Hadoop.  The
 limitations of HDFS/S3, when using the s3:// or s3n:// URIs, seem to
 revolve around renaming objects (copy/delete) in Riak CS.  If you can avoid
 that, this link should work fine.

 Regarding how data is stored in Riak CS, the data block storage is Bitcask
 with manifest storage being held in LevelDB.  Riak CS is optimized for
 larger object sizes and I believe smaller object sizes would not be nearly
 as efficient as working with plain Riak if only because of the overhead
 incurred by Riak CS. The benefits of Riak generally carry over to Riak CS
 so there shouldn't be any need to worry about losing raw power.

 Respectfully -
 Dan Kerrigan


 On Tue, Jul 30, 2013 at 2:21 PM, gbrits gbr...@gmail.com wrote:

 This may be totally missing the mark but I've been reading up on ways to
 do
 fast iterative processing in Storm or Spark/shark, with the ultimate goal
 of
 results ending up in Riak for fast multi-key retrieval.

 I want this setup to be as lean as possible for obvious reasons so I've
 started to look more closely at the possible Riak CS / Spark combo.

 Apparently, please correct if wrong, Riak CS sits on top of Riak and is
 S3-api compliant. Underlying the db for the objects is levelDB (which
 would
 have been my choice anyway, bc of the low in-mem key overhead) Apparently
 Bitcask is also used, although it's not clear to me what for exactly.

 At the same time Spark (with Shark on top, which is what Hive is for
 Hadoop
 if that in any way makes things clearer) can use HDFS or S3 as it's so
 called 'deep store'.

 Combining this it seems, Riak CS and Spark/Shark could be a nice pretty
 tight combo providing interative and adhoc quering through Shark + all the
 excellent stuff of Riak through the S3 protocol which they both speak .

 Is this correct?
 Would I loose any of the raw power of Riak when going with Riak CS? Anyone
 ever tried this combo?

 Thanks,
 Geert-Jan



 --
 View this message in context:
 http://riak-users.197444.n3.nabble.com/combining-Riak-CS-and-Spark-shark-by-speaking-over-s3-protocol-tp4028621.html
 Sent from the Riak Users mailing list archive at Nabble.com.

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: combining Riak (CS) and Spark/shark by speaking over s3 protocol

2013-07-31 Thread gbrits
Thanks for the links Mark. Certainly looks possible to me. A Riak +
Spark/Shark setup almost looks like a match made in heaven. So i'm doing my
due diligence before getting too excited, since there's not too much work
around combining the two, suggesting I might be overlooking something.
Going to try the setup and see what comes out.


2013/7/31 Mark Hamstra [via Riak Users] 
ml-node+s197444n402862...@n3.nabble.com

 Others have certainly found benefits in combining Spark/Shark with a
 Dynamo-type KV-store.  With robust Hadoop Input/OutputFormats it's not too
 difficult (e.g. see 
 thishttp://www.slideshare.net/EvanChan2/cassandra2013-spark-talk-finaland
 this http://tuplejump.github.io/calliope/), and It may be possible to
 do as you suggest with the s3 API of Riak CS.  What also may be worth
 exploring is if Riak and Spark/Shark can rendezvous via 
 Tachyonhttps://github.com/amplab/tachyon/wiki.
  That would be more of a research project right now, but it could end up
 someplace interesting.


 On Tue, Jul 30, 2013 at 1:24 PM, Dan Kerrigan [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4028629i=0
  wrote:

 Geert-Jan -

 We're currently working on a somewhat similar project to integrate Flume
 to ingest data into Riak CS for later processing using Hadoop.  The
 limitations of HDFS/S3, when using the s3:// or s3n:// URIs, seem to
 revolve around renaming objects (copy/delete) in Riak CS.  If you can avoid
 that, this link should work fine.

 Regarding how data is stored in Riak CS, the data block storage is
 Bitcask with manifest storage being held in LevelDB.  Riak CS is optimized
 for larger object sizes and I believe smaller object sizes would not be
 nearly as efficient as working with plain Riak if only because of the
 overhead incurred by Riak CS. The benefits of Riak generally carry over to
 Riak CS so there shouldn't be any need to worry about losing raw power.

 Respectfully -
 Dan Kerrigan


 On Tue, Jul 30, 2013 at 2:21 PM, gbrits [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4028629i=1
  wrote:

 This may be totally missing the mark but I've been reading up on ways to
 do
 fast iterative processing in Storm or Spark/shark, with the ultimate
 goal of
 results ending up in Riak for fast multi-key retrieval.

 I want this setup to be as lean as possible for obvious reasons so I've
 started to look more closely at the possible Riak CS / Spark combo.

 Apparently, please correct if wrong, Riak CS sits on top of Riak and is
 S3-api compliant. Underlying the db for the objects is levelDB (which
 would
 have been my choice anyway, bc of the low in-mem key overhead) Apparently
 Bitcask is also used, although it's not clear to me what for exactly.

 At the same time Spark (with Shark on top, which is what Hive is for
 Hadoop
 if that in any way makes things clearer) can use HDFS or S3 as it's so
 called 'deep store'.

 Combining this it seems, Riak CS and Spark/Shark could be a nice pretty
 tight combo providing interative and adhoc quering through Shark + all
 the
 excellent stuff of Riak through the S3 protocol which they both speak .

 Is this correct?
 Would I loose any of the raw power of Riak when going with Riak CS?
 Anyone
 ever tried this combo?

 Thanks,
 Geert-Jan



 --
 View this message in context:
 http://riak-users.197444.n3.nabble.com/combining-Riak-CS-and-Spark-shark-by-speaking-over-s3-protocol-tp4028621.html
 Sent from the Riak Users mailing list archive at Nabble.com.

 ___
 riak-users mailing list
 [hidden email] http://user/SendEmail.jtp?type=nodenode=4028629i=2
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



 ___
 riak-users mailing list
 [hidden email] http://user/SendEmail.jtp?type=nodenode=4028629i=3
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



 ___
 riak-users mailing list
 [hidden email] http://user/SendEmail.jtp?type=nodenode=4028629i=4
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://riak-users.197444.n3.nabble.com/combining-Riak-CS-and-Spark-shark-by-speaking-over-s3-protocol-tp4028621p4028629.html
  To unsubscribe from combining Riak (CS) and Spark/shark by speaking over
 s3 protocol, click 
 herehttp://riak-users.197444.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4028621code=Z2JyaXRzQGdtYWlsLmNvbXw0MDI4NjIxfDExNjk3MTIyNTA=
 .
 NAMLhttp://riak-users.197444.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--

Re: Secondary index reverse sort

2013-07-31 Thread Russell Brown
Hi Lucas,

I'm sorry, as easy as it would have been to add with the latest changes, we 
just ran out of time.

It is something I'd love to add in future. Or maybe something a contributor 
could add? (Happy to advise / review.)

Many thanks

Russell

On 31 Jul 2013, at 02:04, Lucas Cooper bobobo1...@gmail.com wrote:

 I've seen that the results of secondary index queries are sorted on index 
 values by default.
 I was wondering if there's something I'm missing that would allow me to fetch 
 those keys but reverse sorted.
 
 I have indexes based on UNIX timestamps and I'd like to grab the most recent 
 keys.
 I'd like this query to be running on demand so I'd like to avoid MapReduce if 
 at all possible.
 
 
 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Secondary index reverse sort

2013-07-31 Thread Lucas Cooper
I'm happy to wait, it isn't urgently needed as my project is still in
development.

I'd contribute myself if I was confident at all programming in Erlang but
I'm still just getting into declarative languages at the moment :)

On Wed, Jul 31, 2013 at 10:03 PM, Russell Brown russell.br...@mac.comwrote:

 Hi Lucas,

 I'm sorry, as easy as it would have been to add with the latest changes,
 we just ran out of time.

 It is something I'd love to add in future. Or maybe something a
 contributor could add? (Happy to advise / review.)

 Many thanks

 Russell

 On 31 Jul 2013, at 02:04, Lucas Cooper bobobo1...@gmail.com wrote:

  I've seen that the results of secondary index queries are sorted on
 index values by default.
  I was wondering if there's something I'm missing that would allow me to
 fetch those keys but reverse sorted.
 
  I have indexes based on UNIX timestamps and I'd like to grab the most
 recent keys.
  I'd like this query to be running on demand so I'd like to avoid
 MapReduce if at all possible.
 
 
  ___
  riak-users mailing list
  riak-users@lists.basho.com
  http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Secondary index reverse sort

2013-07-31 Thread Jon Meredith
As a workaround, you can always store (2^31 - timestamp) as an additional
index and use that when you need to do the reverse retrieval.  Beware 2038.

Jon


On Wed, Jul 31, 2013 at 1:07 PM, Lucas Cooper bobobo1...@gmail.com wrote:

 I'm happy to wait, it isn't urgently needed as my project is still in
 development.

 I'd contribute myself if I was confident at all programming in Erlang but
 I'm still just getting into declarative languages at the moment :)


 On Wed, Jul 31, 2013 at 10:03 PM, Russell Brown russell.br...@mac.comwrote:

 Hi Lucas,

 I'm sorry, as easy as it would have been to add with the latest changes,
 we just ran out of time.

 It is something I'd love to add in future. Or maybe something a
 contributor could add? (Happy to advise / review.)

 Many thanks

 Russell

 On 31 Jul 2013, at 02:04, Lucas Cooper bobobo1...@gmail.com wrote:

  I've seen that the results of secondary index queries are sorted on
 index values by default.
  I was wondering if there's something I'm missing that would allow me to
 fetch those keys but reverse sorted.
 
  I have indexes based on UNIX timestamps and I'd like to grab the most
 recent keys.
  I'd like this query to be running on demand so I'd like to avoid
 MapReduce if at all possible.
 
 
  ___
  riak-users mailing list
  riak-users@lists.basho.com
  http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




-- 
Jon Meredith
VP, Engineering
Basho Technologies, Inc.
jmered...@basho.com
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: combining Riak (CS) and Spark/shark by speaking over s3 protocol

2013-07-31 Thread Dan Kerrigan
Geert-Jan

Riak CS currently doesn't support the S3 Copy command.  Flume and Hadoop
distcp create a temporary object and then attempts to Copy that object to
it's permanent location.  Rename is a Copy then a Delete since the S3 API
doesn't support Rename.

Regarding efficiency, Riak CS block sizes are 1MB (100 MB object, 100 Riak
Bitcask stored objects) so you can use the Bitcask calculator at [0] to get
a rough estimate requirements to store your particular dataset. Regarding
the impact to read latency, severe is probably not the right word but there
is an impact.  Besides API support, your decision will, in part, come down
to how large your object sizes are going to be.  The Riak FAQ [1] currently
suggests that Riak Object sizes should be less than 10MB.  Riak CS on the
other hand can handle object sizes up to 5TB.  If you are doing multi-key
retrieves for lots of small objects, Riak looks like the right choice
otherwise, go with Riak CS.  Some basic testing would go a long way to find
the balance in your case.

Respectfully -
Dan Kerrigan

[0] http://docs.basho.com/riak/latest/ops/building/planning/bitcask/
[1]
http://docs.basho.com/riak/latest/community/faqs/developing/#is-there-a-limit-on-the-file-size-that-can-be-stor



On Wed, Jul 31, 2013 at 4:43 AM, Geert-Jan Brits gbr...@gmail.com wrote:

 Dan,

 Not sure if I understand the renaming objects-problem in Riak CS. Can
 you elaborate?

  I believe smaller object sizes would not be nearly as efficient as
 working with plain Riak if only because of the overhead incurred by Riak
 CS. Does this mean lack of efficiency in disk storage, in-mem or both?
 Moreover I'm having this nagging thought that having to dig through the
 manifest to find the blocks will severely impact read latency for
 (multi-key) lookups as opposed to the normal bitcask / levelDB lookup.  Is
 this correct?

 Best,
 Geert-Jan


 2013/7/30 Dan Kerrigan dan.kerri...@gmail.com

 Geert-Jan -

 We're currently working on a somewhat similar project to integrate Flume
 to ingest data into Riak CS for later processing using Hadoop.  The
 limitations of HDFS/S3, when using the s3:// or s3n:// URIs, seem to
 revolve around renaming objects (copy/delete) in Riak CS.  If you can avoid
 that, this link should work fine.

 Regarding how data is stored in Riak CS, the data block storage is
 Bitcask with manifest storage being held in LevelDB.  Riak CS is optimized
 for larger object sizes and I believe smaller object sizes would not be
 nearly as efficient as working with plain Riak if only because of the
 overhead incurred by Riak CS. The benefits of Riak generally carry over to
 Riak CS so there shouldn't be any need to worry about losing raw power.

 Respectfully -
 Dan Kerrigan


 On Tue, Jul 30, 2013 at 2:21 PM, gbrits gbr...@gmail.com wrote:

 This may be totally missing the mark but I've been reading up on ways to
 do
 fast iterative processing in Storm or Spark/shark, with the ultimate
 goal of
 results ending up in Riak for fast multi-key retrieval.

 I want this setup to be as lean as possible for obvious reasons so I've
 started to look more closely at the possible Riak CS / Spark combo.

 Apparently, please correct if wrong, Riak CS sits on top of Riak and is
 S3-api compliant. Underlying the db for the objects is levelDB (which
 would
 have been my choice anyway, bc of the low in-mem key overhead) Apparently
 Bitcask is also used, although it's not clear to me what for exactly.

 At the same time Spark (with Shark on top, which is what Hive is for
 Hadoop
 if that in any way makes things clearer) can use HDFS or S3 as it's so
 called 'deep store'.

 Combining this it seems, Riak CS and Spark/Shark could be a nice pretty
 tight combo providing interative and adhoc quering through Shark + all
 the
 excellent stuff of Riak through the S3 protocol which they both speak .

 Is this correct?
 Would I loose any of the raw power of Riak when going with Riak CS?
 Anyone
 ever tried this combo?

 Thanks,
 Geert-Jan



 --
 View this message in context:
 http://riak-users.197444.n3.nabble.com/combining-Riak-CS-and-Spark-shark-by-speaking-over-s3-protocol-tp4028621.html
 Sent from the Riak Users mailing list archive at Nabble.com.

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Secondary index reverse sort

2013-07-31 Thread Lucas Cooper
That should work well in development for my application actually. Thanks
for the tip!

On Wed, Jul 31, 2013 at 10:26 PM, Jon Meredith jmered...@basho.com wrote:

 As a workaround, you can always store (2^31 - timestamp) as an additional
 index and use that when you need to do the reverse retrieval.  Beware 2038.

 Jon


 On Wed, Jul 31, 2013 at 1:07 PM, Lucas Cooper bobobo1...@gmail.comwrote:

 I'm happy to wait, it isn't urgently needed as my project is still in
 development.

 I'd contribute myself if I was confident at all programming in Erlang but
 I'm still just getting into declarative languages at the moment :)


 On Wed, Jul 31, 2013 at 10:03 PM, Russell Brown russell.br...@mac.comwrote:

 Hi Lucas,

 I'm sorry, as easy as it would have been to add with the latest changes,
 we just ran out of time.

 It is something I'd love to add in future. Or maybe something a
 contributor could add? (Happy to advise / review.)

 Many thanks

 Russell

 On 31 Jul 2013, at 02:04, Lucas Cooper bobobo1...@gmail.com wrote:

  I've seen that the results of secondary index queries are sorted on
 index values by default.
  I was wondering if there's something I'm missing that would allow me
 to fetch those keys but reverse sorted.
 
  I have indexes based on UNIX timestamps and I'd like to grab the most
 recent keys.
  I'd like this query to be running on demand so I'd like to avoid
 MapReduce if at all possible.
 
 
  ___
  riak-users mailing list
  riak-users@lists.basho.com
  http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com



 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




 --
 Jon Meredith
 VP, Engineering
 Basho Technologies, Inc.
 jmered...@basho.com

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Installing protobuf 2.5.0 for Riak Python 2.0

2013-07-31 Thread Sean Cribbs
Matt,

For compatibility reasons, we use 2.4.1, which is pinned in the
requirements of the riak_pb package. We intend to move to 2.5 for later
releases.


On Tue, Jul 30, 2013 at 11:02 PM, Matt Black matt.bl...@jbadigital.comwrote:

 Hello list,

 I've been eagerly awaiting the latest update to the python bindings, so it
 was with great enthusiasm that I started on it this morning!

 However, I'm unable to install the latest v2.5 of protobuf. Has anyone
 else had problems? Presumably it works for others on different setups.
 (Sean?)

 (test)vagrant@boomerang:/tmp/protobuf-2.5.0/python  python setup.py build
 running build
 running build_py
 Generating google/protobuf/unittest_pb2.py...
 google/protobuf/unittest_import.proto:53:8: Expected a string naming the
 file to import.
 google/protobuf/unittest.proto: Import
 google/protobuf/unittest_import.proto was not found or had errors.
 google/protobuf/unittest.proto:97:12:
 protobuf_unittest_import.ImportMessage is not defined.
 google/protobuf/unittest.proto:101:12:
 protobuf_unittest_import.ImportEnum is not defined.
 google/protobuf/unittest.proto:107:12:
 protobuf_unittest_import.PublicImportMessage is not defined.
 google/protobuf/unittest.proto:135:12:
 protobuf_unittest_import.ImportMessage is not defined.
 google/protobuf/unittest.proto:139:12:
 protobuf_unittest_import.ImportEnum is not defined.
 google/protobuf/unittest.proto:165:12:
 protobuf_unittest_import.ImportEnum is not defined.
 google/protobuf/unittest.proto:216:12:
 protobuf_unittest_import.ImportMessage is not defined.
 google/protobuf/unittest.proto:221:12:
 protobuf_unittest_import.ImportEnum is not defined.
 google/protobuf/unittest.proto:227:12:
 protobuf_unittest_import.PublicImportMessage is not defined.
 google/protobuf/unittest.proto:256:12:
 protobuf_unittest_import.ImportMessage is not defined.
 google/protobuf/unittest.proto:261:12:
 protobuf_unittest_import.ImportEnum is not defined.
 google/protobuf/unittest.proto:291:12:
 protobuf_unittest_import.ImportEnum is not defined.

 vagrant@boomerang:~  uname -a
 Linux boomerang 3.2.0-29-virtual #46-Ubuntu SMP Fri Jul 27 17:23:50 UTC
 2012 x86_64 x86_64 x86_64 GNU/Linux

 vagrant@boomerang:~  lsb_release -a
 No LSB modules are available.
 Distributor ID: Ubuntu
 Description:Ubuntu 12.04.2 LTS
 Release:12.04
 Codename:   precise

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




-- 
Sean Cribbs s...@basho.com
Software Engineer
Basho Technologies, Inc.
http://basho.com/
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: combining Riak (CS) and Spark/shark by speaking over s3 protocol

2013-07-31 Thread gbrits
I appreciate the clarification. Objectsize for the dataset I'm currently
investigating is 8KB (on the bit exactly) so that would be loads of
overhead when going the Riak CS route.
Upfront I already figured Riak directly was a more efficient way to go, but
getting a nice Riak + Spark/Shark integration going (through S3) is worth a
lot to me as well. Some experimenting to do I guess :)

Thanks,
Geert-Jan


2013/7/31 Dan Kerrigan [via Riak Users] 
ml-node+s197444n4028644...@n3.nabble.com

 Geert-Jan

 Riak CS currently doesn't support the S3 Copy command.  Flume and Hadoop
 distcp create a temporary object and then attempts to Copy that object to
 it's permanent location.  Rename is a Copy then a Delete since the S3 API
 doesn't support Rename.

 Regarding efficiency, Riak CS block sizes are 1MB (100 MB object, 100 Riak
 Bitcask stored objects) so you can use the Bitcask calculator at [0] to get
 a rough estimate requirements to store your particular dataset. Regarding
 the impact to read latency, severe is probably not the right word but there
 is an impact.  Besides API support, your decision will, in part, come down
 to how large your object sizes are going to be.  The Riak FAQ [1] currently
 suggests that Riak Object sizes should be less than 10MB.  Riak CS on the
 other hand can handle object sizes up to 5TB.  If you are doing multi-key
 retrieves for lots of small objects, Riak looks like the right choice
 otherwise, go with Riak CS.  Some basic testing would go a long way to find
 the balance in your case.

 Respectfully -
 Dan Kerrigan

 [0] http://docs.basho.com/riak/latest/ops/building/planning/bitcask/
 [1]
 http://docs.basho.com/riak/latest/community/faqs/developing/#is-there-a-limit-on-the-file-size-that-can-be-stor



 On Wed, Jul 31, 2013 at 4:43 AM, Geert-Jan Brits [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4028644i=0
  wrote:

 Dan,

 Not sure if I understand the renaming objects-problem in Riak CS. Can
 you elaborate?

  I believe smaller object sizes would not be nearly as efficient as
 working with plain Riak if only because of the overhead incurred by Riak
 CS. Does this mean lack of efficiency in disk storage, in-mem or both?
 Moreover I'm having this nagging thought that having to dig through the
 manifest to find the blocks will severely impact read latency for
 (multi-key) lookups as opposed to the normal bitcask / levelDB lookup.  Is
 this correct?

 Best,
 Geert-Jan


 2013/7/30 Dan Kerrigan [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4028644i=1
 

 Geert-Jan -

 We're currently working on a somewhat similar project to integrate Flume
 to ingest data into Riak CS for later processing using Hadoop.  The
 limitations of HDFS/S3, when using the s3:// or s3n:// URIs, seem to
 revolve around renaming objects (copy/delete) in Riak CS.  If you can avoid
 that, this link should work fine.

 Regarding how data is stored in Riak CS, the data block storage is
 Bitcask with manifest storage being held in LevelDB.  Riak CS is optimized
 for larger object sizes and I believe smaller object sizes would not be
 nearly as efficient as working with plain Riak if only because of the
 overhead incurred by Riak CS. The benefits of Riak generally carry over to
 Riak CS so there shouldn't be any need to worry about losing raw power.

 Respectfully -
 Dan Kerrigan


 On Tue, Jul 30, 2013 at 2:21 PM, gbrits [hidden 
 email]http://user/SendEmail.jtp?type=nodenode=4028644i=2
  wrote:

 This may be totally missing the mark but I've been reading up on ways
 to do
 fast iterative processing in Storm or Spark/shark, with the ultimate
 goal of
 results ending up in Riak for fast multi-key retrieval.

 I want this setup to be as lean as possible for obvious reasons so I've
 started to look more closely at the possible Riak CS / Spark combo.

 Apparently, please correct if wrong, Riak CS sits on top of Riak and is
 S3-api compliant. Underlying the db for the objects is levelDB (which
 would
 have been my choice anyway, bc of the low in-mem key overhead)
 Apparently
 Bitcask is also used, although it's not clear to me what for exactly.

 At the same time Spark (with Shark on top, which is what Hive is for
 Hadoop
 if that in any way makes things clearer) can use HDFS or S3 as it's so
 called 'deep store'.

 Combining this it seems, Riak CS and Spark/Shark could be a nice pretty
 tight combo providing interative and adhoc quering through Shark + all
 the
 excellent stuff of Riak through the S3 protocol which they both speak .

 Is this correct?
 Would I loose any of the raw power of Riak when going with Riak CS?
 Anyone
 ever tried this combo?

 Thanks,
 Geert-Jan



 --
 View this message in context:
 http://riak-users.197444.n3.nabble.com/combining-Riak-CS-and-Spark-shark-by-speaking-over-s3-protocol-tp4028621.html
 Sent from the Riak Users mailing list archive at Nabble.com.

 ___
 riak-users mailing list
 [hidden email] 

Riak Recap for July 23-31

2013-07-31 Thread John Daily
Greetings, Riak world.

It's time to say farewell to July. There will be many opportunities to see the 
Basho crew out and about in August, so I've included the next couple of weeks 
below, and you can always visit http://basho.com/events to see what's coming to 
your community.

Don't forget that the granddaddy Riak event of them all is looming: RICON West 
in San Francisco in October. Grab your early bird tickets while you can, and 
expect to hear more about the speakers very soon.

http://ricon.io/west.html

John
twitter.com/macintux
 

Riak Recap for July 23-31 
=== 

Basho announced a critical issue impacting 1.4.0 upgrades when Riak
Control is running.
- 
http://lists.basho.com/pipermail/riak-critical-issues_lists.basho.com/2013-July/04.html

Brian Roach announced Riak Java Client 1.1.2, 1.4.0 (and shortly thereafter 
1.4.1).
- 
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-July/012729.html
- 
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-July/012767.html

Sean Cribbs belatedly announced riak-erlang-client 1.4.0.
- 
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-July/012796.html

Now that Riak 1.4 is out the door, Basho engineers are seeking input
on future changes.
- Andrew Thompson: Adding security to Riak
   * 
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-July/012730.html
- Brian Roach: Removing HTTP support from the Java client
   * 
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-July/012731.html
- Sean Cribbs: Client autoconfig
   * 
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-July/012743.html
- Russell Brown: CRDTs
   * 
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2013-July/012744.html

O'Reilly published a gentle rant by Eric Redmond on choices in technology.
- 
http://programming.oreilly.com/2013/07/nosql-choices-to-misfit-or-cargo-cult.html


Basho on the Road
===
Tonight, Mark Phillips will be talking Riak 1.4 and RICON West at
StackMob HQ in San Francisco
- July 31 / http://www.meetup.com/San-Francisco-Riak-Meetup/events/129762112/

Tomorrow, Tom Santero and Andrew Stone will be talking consensus
algorithms at the Erlang meetup in NYC
- August 1 / http://www.meetup.com/Erlang-NYC/events/131394712/

A random sampling of Basho memelords will be present at the Riak
meetup in Boston next Monday
- August 5 / http://www.meetup.com/Boston-Riak/events/131992742/

Next Wednesday, in Norwich, England, Christian Dahlqvist will be talking
about Riak design and data modeling
- August 7 / http://www.meetup.com/Norfolk-Developers-NorDev/events/121000182/

Also on Wednesday, in Köln, Germany, Richard Shaw will be talking
about Yokozuna
- August 7 / http://www.nosql-cologne.org

Next Thursday in Herndon, Virginia, Stuart McCaul will be presenting
on Rovio's use of Riak; several other Bashoites will be in attendance
- August 8 / http://www.meetup.com/Riak-DC/events/130773522/

Next Saturday, Hector Castro and Casey Rosenthal will be running a
Riak workshop at FOSSCON
- August 10 / http://fosscon.org


Looking further out, several events take place the following Tuesday:

Andy Gross will be in London talking distributed systems
- August 13 / http://www.meetup.com/cloud-nosql/events/126742782/

Tom Santero will host a drinkup in Atlanta
- August 13 / http://www.meetup.com/Atlanta-Riak-Meetup/events/131521272/

Pavan Venkatesh will lead a discussion of the changes to Riak 1.4 in
Santa Monica
- August 13 / http://www.meetup.com/Los-Angeles-Riak-Meetup/events/132040662/



___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


view total number of workers being used?

2013-07-31 Thread Bhuwan Chawla
I was wondering if there's any way to see the total number of workers
currently being used (workers used of worker_limit)?
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Unit testing persistence

2013-07-31 Thread Wagner Camarao
Hi all ~

Great meetup today - looking forward to upgrading to 1.4

I had a question Mark suggested posting here, then we discussed with a few
other folks too: How do we unit / integration test persistence with riak?

Given a basic dev environment, e.g. running only one riak physical node
locally with all configs default, how do we surely read the data we just
wrote?

I have tried setting DW=all (Durable Write - as recommended for best
consistency in the financial example from the little riak book - section
for developers, more than N/R/W) and tried also using {delete_mode, keep}
in riak_kv app.config (since I truncate the buckets after each test suite),
but still, I get intermittent test failures as eventually the data isn't
available for reading right after writing.

Please note I'm trying to avoid mocking / stubbing as well as hacks like
keep trying to read until a certain timeout.

I'm looking ideally for a simple configuration or any known best practices.

Thanks
~W
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Unit testing persistence

2013-07-31 Thread Jeremy Ong
Be advised, avoiding mocking/stubbing and making your tests unit
tests are mutually exclusive. A unit test by definition should not
have any dependencies whatsoever (on other modules even, let alone a
database!).

On Wed, Jul 31, 2013 at 9:41 PM, Wagner Camarao wag...@crunchbase.com wrote:
 Hi all ~

 Great meetup today - looking forward to upgrading to 1.4

 I had a question Mark suggested posting here, then we discussed with a few
 other folks too: How do we unit / integration test persistence with riak?

 Given a basic dev environment, e.g. running only one riak physical node
 locally with all configs default, how do we surely read the data we just
 wrote?

 I have tried setting DW=all (Durable Write - as recommended for best
 consistency in the financial example from the little riak book - section for
 developers, more than N/R/W) and tried also using {delete_mode, keep} in
 riak_kv app.config (since I truncate the buckets after each test suite), but
 still, I get intermittent test failures as eventually the data isn't
 available for reading right after writing.

 Please note I'm trying to avoid mocking / stubbing as well as hacks like
 keep trying to read until a certain timeout.

 I'm looking ideally for a simple configuration or any known best practices.

 Thanks
 ~W

 ___
 riak-users mailing list
 riak-users@lists.basho.com
 http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com