Hi,
I was looking at following method:
public void doBulkLoad(Path hfofDir, final Admin admin, Table table,
>
> RegionLocator regionLocator) throws TableNotFoundException,
> IOException {
>
We can optimize following part of this method:
353 ArrayList<String> familyNames = new
> ArrayList<String>(families.size());
>
> 354 for (HColumnDescriptor family : families) {
>
> 355 familyNames.add(family.getNameAsString());
>
> 356 }
>
> 357 ArrayList<String> unmatchedFamilies = new ArrayList<String>();
>
> 358 Iterator<LoadQueueItem> queueIter = queue.iterator();
>
> 359 while (queueIter.hasNext()) {
>
> 360 LoadQueueItem lqi = queueIter.next();
>
> 361 String familyNameInHFile = Bytes.toString(lqi.family);
>
> 362 if (!familyNames.contains(familyNameInHFile)) {
>
> 363 ¦ unmatchedFamilies.add(familyNameInHFile);
>
> 364 }
>
> 365 }
>
line 353 uses ArrayList data structure for familyNames and calls its
"contains" (line 362) method which is O(n). We can instead use HashSet, its
"contains" method is O(1).
It should increase performance in cases having large number of column
families.
This is my first time here, I can make this change if everything looks fine.
Regards,
Himanshu Verma