Hello,

I have not been able to see the following points addressed in all the 
online material I have read to date on Node, and so, hope to be enlightened 
by some very smart and knowledegable folks here that I presume would be 
reading this.


1. Since I/O happens asynchronously in worker threads, it is possible for a 
single Node process to quickly/efficiently accept 1000s of incoming 
requests compared to something like Apache. But, surely, the outgoing 
responses for each of those requests will take their own time, won't it? 
For example, if an isolated and primarily an I/O bound request takes, say, 
3 seconds to get serviced (with no other load on the system), then if 
concurrently hit with 5000 such requests, won't Node take *a lot* of time 
to service them all, *fully*? If this 3-second task happens to involve 
exclusive access to the disk, then it would take 5000 x 3 sec = 15000 
seconds, or over 4 hours of wait to see the response for the last request 
coming out of the Node app. In such scenarios, would it be correct to claim 
that a single-process Node configuration can 'handle' 1000s of requests per 
second (granted, a thread-server like Apache would do a lot worse with 5000 
threads) when all that Node may be doing is simply putting the requests 'on 
hold' till they get *fully* serviced instead of rejecting them outrightly 
on initial their arrival itself? I'm asking this because as I'm reading up 
on Node, I'm often hearing how Node can address the C10K problem without 
any co-mention of any specific application setups or any specific 
application types that Node can or cannot handle... other than the broad, 
CPU- vs I/O-bound type of application classification.


2. What about the context switching overhead of the workers in the 
worker-thread pool? If C10K requests hit a Node-based application, won't 
the workers in the worker-thread pool end up context-switching just as much 
as the user threads in the thread pool of a regular, threaded-server (like 
Apache)...? because, all that would have happened in Node's event thread 
would be a quick request-parsing and request-routing, with the remainder 
(or, the bulk) of the processing still happening in the worker thread? That 
is, does it really matter (as far as minimization of thread 
context-switching is concerned) whether a request/response is handled from 
start to finish in a single thread (in the manner of threaded-server like 
Apache), or whether it happens transparently in a Node-managed worker 
thread with only minimal work (of request parsing and routing) subtracted 
from it? Ignore here the simpler, single-threaded user model of coding that 
comes with an evented server like Node.


3. If the RDBMS instance (say, MySQL) is co-located on the Node server box, 
then would it be correct to classify a database CRUD operation as a pure 
I/O task? My understanding is, a CRUD operation on a large, relational 
database will typically involve heavyduty CPU- and I/O-processing, and not 
just I/O-processing. However, the online material that I've been reading 
seem to label a 'database call' as merely an 'I/O call' which supposedly 
makes your application an I/O-bound application if that is the only the 
thing your application is (mostly) doing.


4. A final question (related to the above themes) that may require 
knowledge of modern hardware and OS which I am not fully up-to-date on. Can 
I/O (on a given I/O device) be done in parallel, or even concurrently if 
not parallelly, and THUS, scale proportionally with user-count? Example: 
Suppose I have written a file-serving Node app that serves files from the 
local hard-disk, making it strongly an I/O-bound app. If hit with N (== ~ 
C10K) concurrent file serving requests, what I think would happen is this: 

   - The event-loop would spawn N async file-read requests, and go idle. 
   (Alternatively, Node would have pre-spawned all its workers in the pool on 
   process startup.)
   - The N async file-read requests would each get assigned to N worker 
   threads in the worker thread pool. If N > pool size, then the balance will 
   be made to wait in some sort of an internal queue to get assigned to a 
   worker.
   - Each file-read request would run concurrently at best, or sequentially 
   at worst - but definitely NOT N-parallelly. That is, even a RAID 
   configuration would be able to service merely a handful of file-read 
   requests parallelly, and certainly not all N parallelly.
   - This would result in a large, total wait-time for the last file 
   serving request to be served *fully*. 

So, if all these 4 points are true, how could we really say that a single 
Node-process based application is good for (because it scales well) for 
I/O-bound applications? Can the mere ability to receive a large number of 
incoming requests and keep them all on hold indefinitely while their I/O 
*fully* completes (versus, rejecting them outrightly on their initial 
arrival itself) be called 'servicing the requests'? Can such an application 
be seen as scaling well with respect to user-count?


Many thanks in advance. 

Regards,
/HS

-- 
Job board: http://jobs.nodejs.org/
New group rules: 
https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to nodejs+unsubscr...@googlegroups.com.
To post to this group, send email to nodejs@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/nodejs/212aa412-25a3-4fd5-b7db-46d4a215995b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to