hemanthumashankar0511 commented on PR #6317:
URL: https://github.com/apache/hive/pull/6317#issuecomment-3931787170

   @abstractdog and @ayushtkn, I wanted to follow up properly on both points 
raised here.
   
   First, @abstractdog, thank you for correcting me on how `HashSet` works! I 
genuinely didn't realize it always computes `hashCode()` first before even 
getting to `equals()`. I was wrong to claim the Set check was "mostly just 
comparing memory addresses," and I really appreciate you taking the time to 
explain that clearly.
   
   Regarding the self-join safety concern, I decided to actually debug this 
locally. I attached a debugger to a test run, put a breakpoint inside 
`configureJobConf`, and inspected the `aliasToPartnInfo` map while executing a 
self-join query:
   
   ```sql
   SELECT * FROM test t1 JOIN test t2 USING(a);
   ```
   
   When I expanded `aliasToPartnInfo` in the debugger, I could see two entries: 
one for alias `t1` and one for alias `t2`. Both PartitionDesc objects had their 
tableDesc field pointing to the exact same @ identity number in the debugger, 
confirming they are the exact same Java object instance in memory.
   
   So, my original safety argument was wrong! I thought that a self-join might 
produce two distinct `TableDesc` instances with different column 
configurations, but that's not what happens. Hive reuses the exact same 
`TableDesc` instance for all aliases of the same underlying table. 
   
   Because of this, `Set<TableDesc>` and `Set<String>` behave identically in 
this scenario, they both deduplicate correctly without skipping anything.
   
   I am more than happy to switch to using `Set<String>` via 
`tableDesc.getTableName()` as you suggested. It is definitely lighter, and the 
behavior is exactly the same. I'll update the patch right away.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to