[ https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059978#comment-13059978 ]
Jeremy Hanna commented on CASSANDRA-2855: ----------------------------------------- {quote} What I think we could do is not bother including empty rows in the resultset, IF we are doing a slice query for the entire row. (Since, as soon as the tombstones expire, they will be gone anyway.) {quote} Yeah - our primary concern is tombstones. Would be great to get that done at a lower level. > Add hadoop support option to skip rows with empty columns > --------------------------------------------------------- > > Key: CASSANDRA-2855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2855 > Project: Cassandra > Issue Type: Improvement > Components: Hadoop > Reporter: Jeremy Hanna > Assignee: Jeremy Hanna > Labels: hadoop > > We have been finding that range ghosts appear in results from Hadoop via Pig. > This could also happen if rows don't have data for the slice predicate that > is given. This leads to having to do a painful amount of defensive checking > on the Pig side, especially in the case of range ghosts. > We would like to add an option to skip rows that have no column values in it. > That functionality existed before in core Cassandra but was removed because > of the performance penalty of that checking. However with Hadoop support in > the RecordReader, that is batch oriented anyway, so individual row reading > performance isn't as much of an issue. Also we would make it an optional > config parameter for each job anyway, so people wouldn't have to incur that > penalty if they are confident that there won't be those empty rows or they > don't care. > It could be parameter cassandra.skip.empty.rows and be true/false. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira