Hi John,
I have just started pulling Twitter conversions using Apache Flume. But I
have not started processing the pulled data yet. And my answers below:
1) How large is each JSON document?
Averages from 100 KB to 2 MB. Flume rolls a new file every 1 minutes (which
is configurable). So the size depends on the number of events happened
during that interval
2) Do they tend to be a single JSON doc per file, or multiples per
file?
Multiples per file - The max file (3.2 MB) had about 1100 JSON docs
3) Do the JSON schemas change over time?
Nope. Since its the standard Twitter API
4) Are there interesting public data sets you would recommend for
experiment?
Twitter API
Thanks,
Lenin
On Tue, Jul 2, 2013 at 9:34 PM, John Lilley wrote:
> I would like to hear your experiences working with large JSON data sets,
> specifically:
>
> **1) **How large is each JSON document?
>
> **2) **Do they tend to be a single JSON doc per file, or multiples
> per file?
>
> **3) **Do the JSON schemas change over time?
>
> **4) **Are there interesting public data sets you would recommend
> for experiment?
>
> Thanks
>
> John
>
> ** **
>