[ 
https://issues.apache.org/jira/browse/CASSANDRA-6091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798976#comment-13798976
 ] 

Jeremy Hanna commented on CASSANDRA-6091:
-----------------------------------------

I think a factor that we've overlooked is data locality.  With smaller ranges 
and the same input split size, there's a higher chance that the split will be 
outside of a single virtual token range.  I have observed that in the job 
counters with vnodes enabled, only about a third of the tasks are data local.  
That would probably need some testing.  The user was doing some tests with 
input split size.

In any case if this is borne out in testing, it is the bigger problem.

> Better Vnode support in hadoop/pig
> ----------------------------------
>
>                 Key: CASSANDRA-6091
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6091
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Alex Liu
>            Assignee: Alex Liu
>
> CASSANDRA-6084 shows there are some issues during running hadoop/pig job if 
> vnodes are enable. Also the hadoop performance of vnode enabled nodes  are 
> bad for there are so many splits.
> The idea is to combine vnode splits into a big sudo splits so it work like 
> vnode is disable for hadoop/pig job



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to