[
https://issues.apache.org/jira/browse/HIVE-1641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923168#action_12923168
]
He Yongqiang commented on HIVE-1641:
------------------------------------
Some tests are failing because of plan change.
Can you refresh the diff?
And some more minor comments, you can fix them in the following up jiras or in
your next patch (some of them are just few lines of change).
1.
NOTSKIPBIGTABLE is defined in both AbstractMapJoinOperator and
CommonJoinOperator. And let's not use 'static'.
2.
In MapJoinObjectKey, metadataTag is always -1, and we serialize and deserialize
it for each key. We can avoid it by simply assume that metadataTag is -1.
3.
In JDBMSinkOperator,
if (hashTable.cacheSize() > 0) {
o.setObj(res);
needNewKey = false;
}
has no effect.
Even hashTable.cacheSize() > 0, and then needNewKey = true
In the following code,
if (needNewKey){
...
hashTable.put(keyObj, valueObj);
}
the keyObj and valueObj is already in hashTable, so the put also has no effect
except put the value to the head of MRUList. But at the put time, it is already
in the head because of the get()
So ideally,
we should put most code into
if (o == null) {
....
if (metadataValueTag[tag] == -1) {
.....
}
if (needNewKey) { //this is always true here
}
} else {
res = o.getObj();
res.add(value);
}
These maybe beneficial to the client performance, and that will be good since
now we are now putting all the process work of small tables at the client.
4.
In JDBMSinkOperator's close(), put hashTable.close(); before uploading jdbm
file. That way, JDBM itself may want to do some cleanup work in the close
before uploading jdbm file.
5.
In JDBMSinkOperator, remove getPersistentFilePath(). there is no referenced to
it.
6.
In MapjoinOperator's loadJDBM, remove line "int alias;"
In loadJDBM(), remove code:
"
for(int i = 0;i<localFiles.length; i++){
Path path = localFiles[i];
}
"
7.
Instead of resolving the file name mapping at runtime. should do it at compile
time. Need to open a follow up jira for this.
8.
In MapredLocalTask, remove line:
"
private MapOperator mo;
private File jdbmFile;
"
Maybe we should print some progress information in startForward(). That way,
client will not think it is not responsible.
AWESOME work! Can you open the follow up jiras for the offline review comments?
> add map joined table to distributed cache
> -----------------------------------------
>
> Key: HIVE-1641
> URL: https://issues.apache.org/jira/browse/HIVE-1641
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Affects Versions: 0.7.0
> Reporter: Namit Jain
> Assignee: Liyin Tang
> Fix For: 0.7.0
>
> Attachments: Hive-1641(3).txt, Hive-1641(4).patch,
> Hive-1641(5).patch, Hive-1641.patch
>
>
> Currently, the mappers directly read the map-joined table from HDFS, which
> makes it difficult to scale.
> We end up getting lots of timeouts once the number of mappers are beyond a
> few thousand, due to
> concurrent mappers.
> It would be good idea to put the mapped file into distributed cache and read
> from there instead.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.