Hello, (everything outside Zeppelin) I had started work on the common crawl datasets, and tried to first have a look at only the data for May 2016. Out of the three formats available, I chose the WET(plain text format). The data only for May is divided into segments and there are 24492 such segments. I downloaded only the first segment for May and got 432MB of data. Now the problem is that my laptop is a very modest machine with core 2 duo processor and 3GB of RAM such that even opening the downloaded data file in LibreWriter filled the RAM completely and hung the machine and bringing the data directly into zeppelin or analyzing it inside zeppelin seems impossible. As good as I know, there are two ways in which I can proceed :
1) Buying a new laptop with more RAM and processor. OR 2) Choosing another dataset I have no problem with either of the above ways or anything that you might suggest but please let me know which way to proceed so that I may be able to work in speed. Meanwhile, I will read more papers and publications on possibilities of analyzing common crawl data. Thanks, Anish.
