Hi,

The window size in a spark streaming is time based which means we have 
different number of elements in each window. For example if you have two 
streams (might be more) which are related to each other and you want to compare 
them in a specific time interval. I am not clear how it will work. Although 
they start running simultaneously, they might have different number of elements 
in each time interval.

The following is output for two streams which have same number of elements and 
ran simultaneously. The left most value is the number of elements in each 
window. If we add the number of elements them, they are same for both streams 
but we can't compare both streams as they are different in window size and 
number of windows.

Can we somehow make windows based on real time values for both streams? or Can 
we make windows based on number of elements?

(n, (mean, varience, SD))

Stream 1

(7462,(1.0535658165371238,4242.001306434091,65.13064798107025))
(44826,(0.2546925855084064,5042.890184382894,71.0133099100647))
(245466,(0.2857731601728941,5014.411691661449,70.81251084138628))
(154852,(0.21907814309792514,3483.800160602281,59.023725404300606))
(156345,(0.3075668844414613,7449.528181550462,86.31064929399189))
(156603,(0.27785151491351234,5917.809892281489,76.9273026452994))
(156047,(0.18130350363672296,4019.0232843737017,63.39576708561623))


Stream 2

(10493,(0.5554953964547791,1254.883548218503,35.42433553672536))
(180649,(0.21684831234050583,1095.9634245399352,33.1053383087975))
(179994,(0.22048869512317407,1443.0566458182718,37.98758541705792))
(179455,(0.20473330254938552,1623.9538730448216,40.29831104456888))
(269817,(0.16987953223480945,3270.663944782799,57.18971887308766))
(101193,(0.21469292497504766,1263.0879032808723,35.53994799209577))


Regards,
Laeeq

Reply via email to