[ https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420146#comment-15420146 ]
Jianfeng Jia commented on ASTERIXDB-1556: ----------------------------------------- In the dataflow related algorithm implementations, we usually assume the append-only data flow, which makes the code cleaner and performs better. I would suggest considering the approach that [~buyingyi] mentioned last week, which builds a bunch of independent hashtables for each partition instead of building a global hashtable. We don't need to change the hashtable code. In case of spilling, we just "delete" that hashtable. > Hash Table used by External hash group-by doesn't conform to the budget. > ------------------------------------------------------------------------ > > Key: ASTERIXDB-1556 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556 > Project: Apache AsterixDB > Issue Type: Bug > Reporter: Taewoo Kim > Assignee: Taewoo Kim > Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf, > 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf > > > When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( > > 2), the system generates an out-of-memory exception. > Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is > translated into massive number of operators (more than 200 operators in the > plan for a 3-way fuzzy join), it could generate out-of-memory exception. > /// Update: as the discussion goes, we found that hash table in the external > hash group by doesn't conform to the frame limit. So, an out of memory > exception happens during the execution of an external hash group by operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)