[jira] [Commented] (CASSANDRA-4047) Bulk hinting

2016-03-19 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201523#comment-15201523
 ] 

Aleksey Yeschenko commented on CASSANDRA-4047:
--

[~pkolaczk] Please reopen if anything has changed since Dec 2014 and this is 
now relevant for Spark.

> Bulk hinting
> 
>
> Key: CASSANDRA-4047
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4047
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Brandon Williams
>  Labels: hints
> Attachments: 4047-2.0-wip.txt, 4047-wip.txt
>
>
> With the introduction of the BulkOutputFormat, there may be cases where 
> someone would like to tolerate node failures and have the job complete, but 
> afterwards since we streamed they have to repair or rely on read repair.  We 
> don't currently have any way of hinting streams, but a node could take a 
> snapshot before acknowledging the stream session, then remember to send the 
> files in the snapshot to the unavailable nodes when they come back up.  This 
> isn't quite ideal since of course the node may have compacted these files, 
> however it's much simpler than any sort of key tracking at this scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4047) Bulk hinting

2014-12-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259954#comment-14259954
 ] 

Piotr Kołaczkowski commented on CASSANDRA-4047:
---

[~jbellis] So far I haven't heard anyone complaining about lack of this feature.

 Bulk hinting
 

 Key: CASSANDRA-4047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
  Labels: hints
 Fix For: 3.0

 Attachments: 4047-2.0-wip.txt, 4047-wip.txt


 With the introduction of the BulkOutputFormat, there may be cases where 
 someone would like to tolerate node failures and have the job complete, but 
 afterwards since we streamed they have to repair or rely on read repair.  We 
 don't currently have any way of hinting streams, but a node could take a 
 snapshot before acknowledging the stream session, then remember to send the 
 files in the snapshot to the unavailable nodes when they come back up.  This 
 isn't quite ideal since of course the node may have compacted these files, 
 however it's much simpler than any sort of key tracking at this scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4047) Bulk hinting

2014-12-29 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260160#comment-14260160
 ] 

Jeremy Hanna commented on CASSANDRA-4047:
-

[~pkolaczk] I believe this blocks a reliable bulk output format for spark and 
hadoop in case of failure to stream to all of the replicas.

 Bulk hinting
 

 Key: CASSANDRA-4047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
  Labels: hints
 Fix For: 3.0

 Attachments: 4047-2.0-wip.txt, 4047-wip.txt


 With the introduction of the BulkOutputFormat, there may be cases where 
 someone would like to tolerate node failures and have the job complete, but 
 afterwards since we streamed they have to repair or rely on read repair.  We 
 don't currently have any way of hinting streams, but a node could take a 
 snapshot before acknowledging the stream session, then remember to send the 
 files in the snapshot to the unavailable nodes when they come back up.  This 
 isn't quite ideal since of course the node may have compacted these files, 
 however it's much simpler than any sort of key tracking at this scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4047) Bulk hinting

2014-12-29 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260384#comment-14260384
 ] 

Piotr Kołaczkowski commented on CASSANDRA-4047:
---

Yeah, you're right. It will be of higher priority when we add support for bulk 
writes in Spark.

 Bulk hinting
 

 Key: CASSANDRA-4047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
  Labels: hints
 Fix For: 3.0

 Attachments: 4047-2.0-wip.txt, 4047-wip.txt


 With the introduction of the BulkOutputFormat, there may be cases where 
 someone would like to tolerate node failures and have the job complete, but 
 afterwards since we streamed they have to repair or rely on read repair.  We 
 don't currently have any way of hinting streams, but a node could take a 
 snapshot before acknowledging the stream session, then remember to send the 
 files in the snapshot to the unavailable nodes when they come back up.  This 
 isn't quite ideal since of course the node may have compacted these files, 
 however it's much simpler than any sort of key tracking at this scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4047) Bulk hinting

2014-12-16 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249117#comment-14249117
 ] 

Jonathan Ellis commented on CASSANDRA-4047:
---

Where do we go from here?  [~pkolaczk] is this important for your team?

 Bulk hinting
 

 Key: CASSANDRA-4047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Yuki Morishita
  Labels: hints
 Fix For: 3.0

 Attachments: 4047-2.0-wip.txt, 4047-wip.txt


 With the introduction of the BulkOutputFormat, there may be cases where 
 someone would like to tolerate node failures and have the job complete, but 
 afterwards since we streamed they have to repair or rely on read repair.  We 
 don't currently have any way of hinting streams, but a node could take a 
 snapshot before acknowledging the stream session, then remember to send the 
 files in the snapshot to the unavailable nodes when they come back up.  This 
 isn't quite ideal since of course the node may have compacted these files, 
 however it's much simpler than any sort of key tracking at this scale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4047) Bulk hinting

2013-11-21 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13829314#comment-13829314
 ] 

Brandon Williams commented on CASSANDRA-4047:
-

Actually I meant I couldn't get the ks/cf name and ranges on the client (bulk 
loader) side, though I guess looking at this we can *almost* get there, except 
we need to them for onFailure and I'm not quite sure how to do that.

 Bulk hinting
 

 Key: CASSANDRA-4047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Yuki Morishita
 Fix For: 2.0.3

 Attachments: 4047-2.0-wip.txt, 4047-wip.txt


 With the introduction of the BulkOutputFormat, there may be cases where 
 someone would like to tolerate node failures and have the job complete, but 
 afterwards since we streamed they have to repair or rely on read repair.  We 
 don't currently have any way of hinting streams, but a node could take a 
 snapshot before acknowledging the stream session, then remember to send the 
 files in the snapshot to the unavailable nodes when they come back up.  This 
 isn't quite ideal since of course the node may have compacted these files, 
 however it's much simpler than any sort of key tracking at this scale.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-4047) Bulk hinting

2013-03-26 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614070#comment-13614070
 ] 

Brandon Williams commented on CASSANDRA-4047:
-

I think I have a plan here that can avoid the hell of making MS multi-port.  
Akin to how shuffle works with creating a 'schedule' for transfers, upon 
failure we can insert the range that failed into a system table on a replica 
that succeeded via thrift, and when the node recovers we can repair the range.

 Bulk hinting
 

 Key: CASSANDRA-4047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 2.0


 With the introduction of the BulkOutputFormat, there may be cases where 
 someone would like to tolerate node failures and have the job complete, but 
 afterwards since we streamed they have to repair or rely on read repair.  We 
 don't currently have any way of hinting streams, but a node could take a 
 snapshot before acknowledging the stream session, then remember to send the 
 files in the snapshot to the unavailable nodes when they come back up.  This 
 isn't quite ideal since of course the node may have compacted these files, 
 however it's much simpler than any sort of key tracking at this scale.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4047) Bulk hinting

2013-03-26 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614924#comment-13614924
 ] 

Jonathan Ellis commented on CASSANDRA-4047:
---

bq. we can insert the range that failed into a system table on a replica that 
succeeded via thrift

You're right, we pretty much need to do that anyway, since we have no 
guarantees that whatever process the Hadoop job is running in will be around 
later.  It really needs to take the approach of injecting the range-hint into 
a C* node.

 Bulk hinting
 

 Key: CASSANDRA-4047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Carl Yeksigian
 Fix For: 2.0


 With the introduction of the BulkOutputFormat, there may be cases where 
 someone would like to tolerate node failures and have the job complete, but 
 afterwards since we streamed they have to repair or rely on read repair.  We 
 don't currently have any way of hinting streams, but a node could take a 
 snapshot before acknowledging the stream session, then remember to send the 
 files in the snapshot to the unavailable nodes when they come back up.  This 
 isn't quite ideal since of course the node may have compacted these files, 
 however it's much simpler than any sort of key tracking at this scale.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4047) Bulk hinting

2012-06-21 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398493#comment-13398493
 ] 

Brandon Williams commented on CASSANDRA-4047:
-

bq. Alternatively... we can do repairs of specific ranges now. What if we 
stored as our hint the range we streamed, and the node that went down, and 
then the live node will run a partial repair with that replica when it comes 
back up?

This sounds like a good way to do it.  One wrinkle though is communication, now 
that bulk loading doesn't have a MS to speak with, it only has streaming or 
thrift available, and shunting this into either seems awkward.

 Bulk hinting
 

 Key: CASSANDRA-4047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.2


 With the introduction of the BulkOutputFormat, there may be cases where 
 someone would like to tolerate node failures and have the job complete, but 
 afterwards since we streamed they have to repair or rely on read repair.  We 
 don't currently have any way of hinting streams, but a node could take a 
 snapshot before acknowledging the stream session, then remember to send the 
 files in the snapshot to the unavailable nodes when they come back up.  This 
 isn't quite ideal since of course the node may have compacted these files, 
 however it's much simpler than any sort of key tracking at this scale.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4047) Bulk hinting

2012-06-21 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398586#comment-13398586
 ] 

Jonathan Ellis commented on CASSANDRA-4047:
---

Is this where we throw up our hands and finally add multi-port ability for 
MessagingService?

 Bulk hinting
 

 Key: CASSANDRA-4047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.2


 With the introduction of the BulkOutputFormat, there may be cases where 
 someone would like to tolerate node failures and have the job complete, but 
 afterwards since we streamed they have to repair or rely on read repair.  We 
 don't currently have any way of hinting streams, but a node could take a 
 snapshot before acknowledging the stream session, then remember to send the 
 files in the snapshot to the unavailable nodes when they come back up.  This 
 isn't quite ideal since of course the node may have compacted these files, 
 however it's much simpler than any sort of key tracking at this scale.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4047) Bulk hinting

2012-06-20 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397956#comment-13397956
 ] 

Brandon Williams commented on CASSANDRA-4047:
-

The tricky part is, who stores the hint?  Since it's a non-member doing the 
streaming... I guess it could just choose some other target at random, but by 
the time it does that the sstable might already be compacted away.

 Bulk hinting
 

 Key: CASSANDRA-4047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.2


 With the introduction of the BulkOutputFormat, there may be cases where 
 someone would like to tolerate node failures and have the job complete, but 
 afterwards since we streamed they have to repair or rely on read repair.  We 
 don't currently have any way of hinting streams, but a node could take a 
 snapshot before acknowledging the stream session, then remember to send the 
 files in the snapshot to the unavailable nodes when they come back up.  This 
 isn't quite ideal since of course the node may have compacted these files, 
 however it's much simpler than any sort of key tracking at this scale.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4047) Bulk hinting

2012-06-20 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13398214#comment-13398214
 ] 

Jonathan Ellis commented on CASSANDRA-4047:
---

Ugh, that is pretty messy.

One complication is that the outputformat node can't assume that it will know 
ahead of time which replicas it will need to generate hints for (since nodes 
can go down after it's started streaming everywhere).

I think that means we'd need to stream a *second* copy (since the first may 
have been compacted already) to one of the nodes, after we learn that hints are 
needed.

Alternatively...  we can do repairs of specific ranges now.  What if we stored 
as our hint the range we streamed, and the node that went down, and then the 
live node will run a partial repair with that replica when it comes back up?

 Bulk hinting
 

 Key: CASSANDRA-4047
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4047
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Brandon Williams
Assignee: Brandon Williams
 Fix For: 1.2


 With the introduction of the BulkOutputFormat, there may be cases where 
 someone would like to tolerate node failures and have the job complete, but 
 afterwards since we streamed they have to repair or rely on read repair.  We 
 don't currently have any way of hinting streams, but a node could take a 
 snapshot before acknowledging the stream session, then remember to send the 
 files in the snapshot to the unavailable nodes when they come back up.  This 
 isn't quite ideal since of course the node may have compacted these files, 
 however it's much simpler than any sort of key tracking at this scale.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira