Normally introducing a sub-VI is trouble-free and as you say it's better to arrange the code that way. If the code is to be called repeatedly (thousands of times) and it all needs to finish as fast as possible the accumulated overhead introduced can get significant and you must skip the sub-VI.
(I actually suggested to NI two days ago to add a feature in LV where you can organize the code in a sub-VI and set it to "flatten when compiled" (or a new type of object) only for organization purposes and then have the compiler flatten the code to optimize performance). When it comes to memory usage the story is a bit more complicated. If you do not have the front panel in memory that will prevent some copies from being made, however even with the front panel removed / not in memory the sub-VI may introduce data copies. If the data amount is large (which is perhaps the case here?) this can be costly. Normally that would not give much of a performance penalty though - but if the amount of memory available is too small and the software must use virtual memory (disk) you can quickly get a significant performance drop. Could lack of memory be an issue here? If not, could you post the caller and sub-VI?