I was recently running some tests with huge page tables. I ran them on two different architectures: x86 and PPC64.
I saw some discussion going on over here so thought of sharing. I was using 3 Cores, 8GB RAM, 2 LUN for filesystem (1 for dbfiles and 1 for logfiles) for these tests... I had dedicated (shared_buffers + 400bytes*max_connection + wal_buffers)/Pagesize [from /proc/meminfo] for huge pages. I kept some overcommit_hugepages to be used by work_mem (max_connection*work_mem)/Pagesize x86_64 bit gave me a benefit of 2-5% for TPC-C workload( I scaled from 1 to 100 users). PPC64 which uses 16MB and 64MB did not give me any benefits in fact the performance degraded as the concurrency of system increased. my 2 cents, hope it helps.