[ https://issues.apache.org/jira/browse/PHOENIX-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494751#comment-14494751 ]
James Taylor edited comment on PHOENIX-1779 at 4/14/15 8:05 PM: ---------------------------------------------------------------- bq. Having two parallel arrays sounds more complicated that maintaing a map, IMHO. But you don't need a map. You've got an index that will get you exactly what you need. It'd be like use a Map<Integer,Object> where the key of the Map is the index. Sure, it'll work to do a map.get(3) to get the fourth element, but so would an array[3] or a list.get(3). If you don't want to do parallel arrays, then do a List<Pair<PeekingResultIterator,Integer>> or maybe more clear a List<RoundRobinIteratorState> where RoundRobinIteratorState is a class with two member variables PeekingResultIterator iterator and int rowsRead. was (Author: jamestaylor): bq. Having two parallel arrays sounds more complicated that maintaing a map, IMHO. But you don't need a map. You've got an index that will get you exactly what you need. If you don't want to do parallel arrays, then do a List<Pair<PeekingResultIterator,Integer>> or maybe more clear a List<RoundRobinIteratorState> where RoundRobinIteratorState is a class with two member variables PeekingResultIterator iterator and int rowsRead. > Parallelize fetching of next batch of records for scans corresponding to > queries with no order by > -------------------------------------------------------------------------------------------------- > > Key: PHOENIX-1779 > URL: https://issues.apache.org/jira/browse/PHOENIX-1779 > Project: Phoenix > Issue Type: Improvement > Reporter: Samarth Jain > Assignee: Samarth Jain > Attachments: PHOENIX-1779.patch, wip.patch, wip3.patch, > wipwithsplits.patch > > > Today in Phoenix we parallelize the first execution of scans i.e. we load > only the first batch of records up to the scan's cache size in parallel. > Loading of subsequent batches of records in scanners is essentially serial. > This could be improved especially for queries, including the ones with no > order by clauses, that do not need any kind of merge sort on the client. > This could also potentially improve the performance of UPSERT SELECT > statements that load data from one table and insert into another. One such > use case being creating immutable indexes for tables that already have data. > It could also potentially improve the performance of our MapReduce solution > for bulk loading data by improving the speed of the loading/mapping phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)