I believe that with delta propagation we could have single collections on a single server, then have a redundant copy on another. I mean the collections would have to be really large to really start warranting distribution across multiple servers. As the perf impact we incur due to network hops is too great to say that the Geode-Redis is "better" than normal Redis.

I think we could have the following options:

1. have a property that could be set to use either single server
   collections over use the current distributed collection
2. have first class collection implementations that are distributed by
   nature, as using key:value as the hammer for all does not make sense

The only value add that the current region implementation has, is that collections can scale. BUT how large are really large collections? Thousands? Hundred of thousands? Millions?

My concern is that any benefit we gain by scaling we'd lose because of per entry overhead. In addition to that, because of the single server connection that the redis client has, we really don't benefit from the "single" hop optimization that normal Geode clients gets.

On 2/15/17 13:34, Real Wes wrote:
Does delta propagation make worrying about frequently updated fat collections 
moot?

On Feb 15, 2017, at 4:29 PM, Dan Smith 
<dsm...@pivotal.io<mailto:dsm...@pivotal.io>> wrote:

Doing the spill/unspill option could be pretty tricky to implement, so you have 
to do a lot of fancy logic in the transition period. I think Jason's suggestion 
of configuring things might make more sense.

-Dan

On Wed, Feb 15, 2017 at 1:12 PM, Jason Huynh 
<jhu...@pivotal.io<mailto:jhu...@pivotal.io>> wrote:
With the suggestion from Wes, the constraint on the names would have to
apply for both small and large.  We wouldn't want the thing to explode when
it gets converted...

Is there a way to just make it configurable?  If they know they want a
"large" set, somehow let them specify it.  Otherwise go with the "small"
set?

On Wed, Feb 15, 2017 at 1:01 PM Real Wes 
<thereal...@outlook.com<mailto:thereal...@outlook.com>> wrote:

Thinking about this, I think that the “spill”/ “unspill” option may
actually be the best solution.  If the criteria waffles back and forth
along the threshold, well, that’s the acceptable worst case.

How’s this?:

1) Create a separate region for the collection key
      - for fat collections that are updated frequently
ADVANTAGE: speed of replication
DISADVANTAGE: constraint on key name

2) Put the collection as an entry value:
    - for small collections and read-only fat collections
ADVANTAGE: no need to create a separate region

We would track the metrics and automatically convert based on a
combination of frequency of updates and size.

We next define what a fat collection is, such as over nnMB.


On Feb 14, 2017, at 8:12 PM, Jason Huynh 
<jhu...@pivotal.io<mailto:jhu...@pivotal.io><mailto:
jhu...@pivotal.io<mailto:jhu...@pivotal.io>>> wrote:

The concern about the threshold to spill over would be do you "unspill"
over?  Like what if the collection contracts under the threshold and
teeters around the threshold.  If the user can configure this size, then
wouldn't they just know they want a "large" vs a "small?"

I think Swapnil makes a good point that our value add would be that we can
scale those structures, whereas redis can already do what the "new"
implementation is doing.



On Tue, Feb 14, 2017 at 4:59 PM Galen M O'Sullivan 
<gosulli...@pivotal.io<mailto:gosulli...@pivotal.io>
<mailto:gosulli...@pivotal.io<mailto:gosulli...@pivotal.io>>> wrote:
If we put them in separate regions, we'll have the overhead of looking up
in two regions added to each and every operation, and the overhead of
creating all these regions.

If we really wanted to we could have some threshold at which we spill
collections over into their own regions, and have something like the best
of both worlds. It's more complex, though, and I don't know how many people
actually use truly huge collections.

On Tue, Feb 14, 2017 at 4:21 PM, Hitesh Khamesra <
hitesh...@yahoo.com.invalid<mailto:hitesh...@yahoo.com.invalid><mailto:hitesh...@yahoo.com.invalid<mailto:hitesh...@yahoo.com.invalid>>>
 wrote:

Jason/Dan: Sorry to hear about that. But both of you have asked the right
question.
it depends on your use-case(item 2,3,4,5) . For example "hashes" can be
use to define key-value pair or java bean. In this case  probably it is
better to keep that hash at region-entry level.  But if you want to know
top 10 tweets which are trending then probably you want use
partition-region for "sorted-set".


       From: Jason Huynh 
<jhu...@pivotal.io<mailto:jhu...@pivotal.io><mailto:jhu...@pivotal.io<mailto:jhu...@pivotal.io>>>
  To: 
dev@geode.apache.org<mailto:dev@geode.apache.org><mailto:dev@geode.apache.org<mailto:dev@geode.apache.org>>;
 "
u...@geode.apache.org<mailto:u...@geode.apache.org><mailto:u...@geode.apache.org<mailto:u...@geode.apache.org>>"
 <
u...@geode.apache.org<mailto:u...@geode.apache.org><mailto:u...@geode.apache.org<mailto:u...@geode.apache.org>>>;
Hitesh Khamesra 
<hitesh...@yahoo.com<mailto:hitesh...@yahoo.com><mailto:hitesh...@yahoo.com<mailto:hitesh...@yahoo.com>>>
  Sent: Tuesday, February 14, 2017 3:15 PM
  Subject: Re: GeodeRedisAdapter improvments/feedback

Hi Hitesh,

Not sure about everyone else, but I had a hard time reading this,
however
I think I figured out what you were describing... the only part I still
am
unsure about is  Feedback/vote: both behaviour is desirable.  Do you mean
you want feedback and voting on whether both behaviors are desired?  As
in
old implementation and new implementation?

2,3,4)  The new implementation would mean all the data for a specific
data
structure is contained in a single bucket.  So the individual data
structures are not quite scalable.  How would you allow scaling of a
single
data structure?

On Tue, Feb 14, 2017 at 3:05 PM Real Wes 
<thereal...@outlook.com<mailto:thereal...@outlook.com><mailto:
thereal...@outlook.com<mailto:thereal...@outlook.com>>> wrote:
In what format do you want the feedback Hitesh?  For now I’ll just
comment:
1. Redis Type String
No comments except that a future Geode value-add would be to extend the
Jedis client so that the K/V’s are not compressed. In this way OQL and
CQ
will work.  The tradeoff of this is that the data cannot be read by a
native redis client but for Geode users it’s great. Call the new client
Geodis.

2. List/ Hash/ Set/ SortedSet
Creating a separate region for each creates a constraint that the keys
are
limited to the characters for region names, which are A-z/0-9/ - and _.
Everything else is out. Redis users might start asking questions why
their
list named ++^^/## throws an error. Your suggestion to make it a key
rather
than a region solves this. Furthermore, creating a new region every
time
a
new Redis collection is created is going to be slow. I’m not sure why a
region was created but I’m sure it made sense to the developer at the
time.
7. Default Config
Can’t we configure a gfsh option to default to the region types we
want?
Customer A will want PARTITION but Customer B will want
PARTITION_REDUNDANT_EXPIRATION_PERSISTENT.  I wonder if we can consider
a
geode> create region —redisType=PARTITION_REDUNDANT_EXPIRATION_
PERSISTENT
that makes _all_ Redis regions of that type?



On Feb 14, 2017, at 5:36 PM, Hitesh Khamesra 
<hitesh...@yahoo.com<mailto:hitesh...@yahoo.com>
<mailto:hitesh...@yahoo.com<mailto:hitesh...@yahoo.com>>
<mailto:
hitesh...@yahoo.com<mailto:hitesh...@yahoo.com><mailto:hitesh...@yahoo.com<mailto:hitesh...@yahoo.com>>>>
 wrote:

Current GeodeRedisAdapter implementation is based on
https://cwiki.apache.org/confluence/display/GEODE/
Geode+Redis+Adapter+Proposal
.
We are looking for some feedback on Redis commands and their mapping to
geode region.

1. Redis Type String
  a. Usage Set k1 v1
  b. Current implementation creates "STRING_REGION"
geode-partition-region
upfront
  c. This k1/v1 are geode-region key/value
  d. Any feedback?

2. List Type
  a. usage "rpush mylist A"
  b. Current implementation maps each list to
geode-partition-region(i.e.
mylist is geode-partition-region); with the ability to get item from
head/tail
  c. Feedback/vote
      -- List type operation at region-entry level;
      -- region-key = "mylist"
      -- region-value = Arraylist (will support all redis list ops)
  d. Feedback/vote: both behavior is desirable


3. Hashes
  a. this represents field-value or java bean object
  b. usage "hmset user1000 username antirez birthyear 1977 verified 1"
  c. Current implementation maps each hashes to
geode-partition-region(i.e. user1000 is geode-partition-region)
  d. Feedback/vote
    -- Should we map hashes to region-entry
    -- region-key = user1000
    -- region-value = map
    -- This will provide java bean sort to behaviour with 10s of
field-value
    -- Personally I would prefer this..
  e. Feedback/vote: both behaviour is desirable

4. Sets
  a. This represents unique keys in set
  b. usage "sadd myset 1 2 3"
  c. Current implementation maps each sadd to
geode-partition-region(i.e.
myset is geode-partition-region)
  d. Feedback/vote
    -- Should we map set to region-entry
    -- region-key = myset
    -- region-value = Hashset
  e. Feedback/vote: both behaviour is desirable

5. SortedSets
  a. This represents unique keys in set with score (usecase Query
top-10)
  b. usage "zadd hackers 1940 "Alan Kay""
  c. Current implementation maps each zadd to
geode-partition-region(i.e.
hackers is geode-partition-region)
  d. Feedback/vote
    -- Should we map set to region-entry
    -- region-key = hackers
    -- region-value = Sorted Hashset
  e. Feedback/vote: both behaviour is desirable

6. HyperLogLogs
  a. A HyperLogLog is a probabilistic data structure used in order to
count unique things (technically this is referred to estimating the
cardinality of a set).
  b. usage "pfadd hll a b c d"
  c. Current implementation creates "HLL_REGION" geode-partition-region
upfront
  d. hll becomes region-key and value is HLL object
  e. any feedback?

7. Default config for geode-region (vote)
    a. partition region
    b. 1 redundant copy
    c. Persistence
    d. Eviction
    e. Expiration
    f. ?

8. It seems; redis knows type(list, hashes, string ,set ..) of each
key.
Thus for each operation we need to make sure type of key. In current
implementation we have different region for each redis type. Thus we
have
another region(metaTypeRegion) which keeps type for each key. This
makes
any operation in geode slow as it needs to verify that type. For
instance,
creating new key need to make sure its already there or not. Whether we
should allow type change or not.
  a. Feedback/vote
      -- type change of key
      -- Can we allow two key with same name but two differnt type (as
it
will endup in two different geode-region)
        String type "key1" in string region
        HLL type "key1" in HLL region
  b. any other feedback

9. Transactions:
  a. we will not support transaction in redisAdapter as geode
transaction
are limited to single node.
  b. feedback?

10. Redis COMMAND (https://redis.io/commands/command)
  a. should we implement this "COMMAND" ?

11. Any other redis command we should consider?


Thanks.
Hitesh








Reply via email to