[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-04-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13956701#comment-13956701
 ] 

Benedict commented on CASSANDRA-6477:
-

New suggestion:

Since we're performing read-before-write anyway with this suggestion, why not 
simply perform a _local only_ read-before-write on each of the nodes that owns 
the main record whilst writing the update - instead of issuing a complex 
tombstone, we simply issue a delete for whichever value is older on reconcile.  
Since we always CAS local updates, we will never get missed deletes, however we 
will issue redundant/duplicate deletes (RF many) - but they should be coalesced 
in memtable almost always, so it's a network cost only. There are probably 
tricks we can do to mitigate this cost, though, e.g. having each node 
(deterministically) pick two of the possible owners of the 2i entry to send the 
deletes it encounters to, to minimise replication of effort but also ensure 
message delivery to all nodes.

Result is we keep compaction logic exactly the same, and we retain 
approximately the same consistency guarantees we currently have.

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-03-31 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955527#comment-13955527
 ] 

Jeremiah Jordan commented on CASSANDRA-6477:


[~benedict] two threads update age = null.  generate tombstones {{24, 
user1->null}}, two of them, so those are OK and not a problem, updated to the 
same value, we also need to generate {{null: user1}} as an append to the index. 
 Then update age=25 generates tombstone {{null, user1->25}} and age=26 
generates tombstone {{null, user1->26}}.  Those two tombstones will be resolved 
on compaction/memtable clash, or when someone asks for age=null as a query.  
This will require keeping track of null columns in the index.  Something 
similar would need to be done for a full delete of the row.

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-03-31 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955399#comment-13955399
 ] 

Jonathan Ellis commented on CASSANDRA-6477:
---

That's why Sylvain said, it's "eventually consistent, but with no good user 
control about how eventual."

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-03-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955412#comment-13955412
 ] 

Benedict commented on CASSANDRA-6477:
-

[~jjordan] is that in response to me? Because I don't see how this would work: 
if both deleted 24 and inserted 25 and 26, then we now have a record of both 25 
and 26 mapping to user1, despite only one of them being true, and no means of 
tidying it up. So people can indefinitely look up on both values. This is only 
resolved if we look up the original record after every 2i result, which maybe 
was always the plan. I'm not sure.

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-03-31 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955400#comment-13955400
 ] 

Jeremiah Jordan commented on CASSANDRA-6477:


If you have the race, you may briefly see the other value, but its a race, and 
it would be just like you read before update #2 happened, so as long as the 
period of time where you can get the "wrong" data is small, it is ok.

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-03-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955397#comment-13955397
 ] 

Benedict commented on CASSANDRA-6477:
-

bq. No, you resolve it in compaction or on lookup of "24".

That only resolves deletes. How do you resolve *seeing the wrong data*?

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-03-31 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955394#comment-13955394
 ] 

Jeremiah Jordan commented on CASSANDRA-6477:


bq. I may be being dim here, but it seems to me that with this scheme you would 
need to write a reverse record of 25, user1->replaced 24, so when you lookup on 
25, you can then read 24 and check there were no competing updates? Either that 
or read the original record, which sort of defeats the point of 
denormalisation...

No, you resolve it in compaction or on lookup of "24".  Compaction sees the two 
different tombstones for 24 and then resolves them to the correct new value, 
deleting the wrong new value.  Or a look up of "24" pulls in the two 
tombstones, resolves them to the correct one, deletes the wrong one, and 
returns none to the user.

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-03-31 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955384#comment-13955384
 ] 

Jeremiah Jordan commented on CASSANDRA-6477:


bq. I'll note that the idea above has the downside to be only eventually 
consistent, but with no good user control about how eventual (we're dependent 
on when read/compaction happen to "heal" the "denormalized index").

I think this might be OK, as this is really only an issue in the case of a 
race, so both tombstones will end up in meltables and be resolved immediately, 
or in sstables written near each other in time (which should hopefully compact 
together fairly quickly).  In both cases resolving the conflict *should* happen 
fairly quickly, though there are probably edge cases.

The issue I see here is that compaction now has to issue queries, and we need 
to make sure those deletes issue by compaction MUST happen, or else the index 
will get out of whack, and we will have already thrown out the extra tombstone.

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-03-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955358#comment-13955358
 ] 

Benedict commented on CASSANDRA-6477:
-

I may be being dim here, but it seems to me that with this scheme you would 
need to write a reverse record of 25, user1->replaced 24, so when you lookup on 
25, you can then read 24 and check there were no competing updates? Either that 
or read the original record, which sort of defeats the point of 
denormalisation...

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-03-31 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955333#comment-13955333
 ] 

Sylvain Lebresne commented on CASSANDRA-6477:
-

I'll note that the idea above has the downside to be only eventually 
consistent, but with no good user control about how eventual (we're dependent 
on when read/compaction happen to "heal" the "denormalized index").

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-03-31 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955262#comment-13955262
 ] 

Jonathan Ellis commented on CASSANDRA-6477:
---

This does mean that a tombstone is not "just a tombstone," i.e., we will have 
to keep all tombstones of this time for gcgs or a similar period, not just "the 
most recent post-merge tombstone" as currently.

But it should be relatively rare to have racing tombstones, so the penalty vs 
the status quo is not actually large in practice.

/cc [~mstump]

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-03-31 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955260#comment-13955260
 ] 

Jonathan Ellis commented on CASSANDRA-6477:
---

Sylvain had a different idea:

Instead of just writing a {{24, user1}} tombstone, write a tombstone that 
indicates what the value changed to: {{24, user1 -> 25}} for one thread, and 
{{24, user1 -> 26}} for the other.

When the tombstones is merged for compaction or read, you can say "wait 2 
people tried to erase that, one with 25 the other with 26, let's check which 
was has a higher timestamp and delete any obsolete entries."


> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2014-03-31 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955258#comment-13955258
 ] 

Jonathan Ellis commented on CASSANDRA-6477:
---

bq. The problem is that this means we can't do lazy updates of the index; we 
need to keep the index perfectly (or, "eventually perfectly") in sync with the 
base table.

To clarify: Suppose you have you index on the age of users, and we have an 
entry for {{24: user1}} in the index table.  Now two threads update user1's 
age; one to 25, and one to 26.  Each thread will

# Read existing age
# Delete index entry for existing age
# Update user record and insert index entry for new age

The problem is if each thread reads the existing age of 24, then we'll end up 
with both {{25: user1}} and {{26: user1} index entries.  (Atomic batches do not 
help with this.)  With normal indexes, we clean up stale entries at compaction 
+ read time; we could still do this here but the performance penalty is a lot 
higher.

Sylvain had another idea.



> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2013-12-20 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854548#comment-13854548
 ] 

Jonathan Ellis commented on CASSANDRA-6477:
---

The counterpoint is that we shouldn't require ~12 client codebases (if done by 
the driver) or 1000s (if done by app code) to invent this instead of the server.

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2013-12-20 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854537#comment-13854537
 ] 

Aleksey Yeschenko commented on CASSANDRA-6477:
--

For the record, I think we should leave it to people's client code. We don't 
need more complexity on our read/write paths when this can be done client-side.

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2013-12-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845819#comment-13845819
 ] 

Jonathan Ellis commented on CASSANDRA-6477:
---

The most straightforward approach is to take a similar approach to our local 
indexes:

# At insert/update time, add a new index entry (as part of an atomic batch with 
the original update]), with the timestamp of the data cell
# At read time, fetch the rows indicated by the index and remove stale index 
entries.  Since we delete with the same timestamp as the index entry, this is 
safe wrt concurrent updates
# We can still use compaction of the base table to clean out stale records, but 
this will now generate updates or hints to the index partition

The big drawback is that reads require an O(N) multiget in the coordinator: 
reading the index entries is a single request, but then each row to fetch may 
be on a different replica.

Put another way, this will give us indexes that are good at very high 
cardinality -- ideally a single row for each indexed value -- to go with our 
existing low-cardinality indexes, but we still have a hole for "medium 
cardinality" data.

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (CASSANDRA-6477) Partitioned indexes

2013-12-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845818#comment-13845818
 ] 

Jonathan Ellis commented on CASSANDRA-6477:
---

Most application-maintained indexes solve this problem by denormalizing the 
base table row into the index entry.  The problem is that this means we can't 
do lazy updates of the index; we need to keep the index perfectly (or, 
"eventually perfectly") in sync with the base table.  Which in turns means we 
need to linearize updates to an indexed table.  That was a performance hit but 
otherwise reasonable when we did that for local indexes; for partitioned 
indexes it's not feasible.

I suppose we could punt and say "we'll give you a denormalized index but you 
have to swear that only one client will update any given row in that table at a 
time" which is actually a fairly common use case...  but it does seem like the 
sort of thing that will bite the incautious user.  Worse, it will appear to 
work but give subtly incorrect results.

> Partitioned indexes
> ---
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Jonathan Ellis
> Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)