You can write a simple python script to process the 1.5GB dataset, use the pandas library for building your predictive model.
Thanks Best Regards On Fri, Dec 4, 2015 at 3:02 PM, Chintan Bhatt < chintanbhatt...@charusat.ac.in> wrote: > Hi, > I'm very much interested to make a predictive model using crime data > (from 2001-present. It is big .csv file (about 1.5 GB) )in spark on > hortonworks. > Can anyone tell me how to start? > > -- > CHINTAN BHATT <http://in.linkedin.com/pub/chintan-bhatt/22/b31/336/> > Assistant Professor, > U & P U Patel Department of Computer Engineering, > Chandubhai S. Patel Institute of Technology, > Charotar University of Science And Technology (CHARUSAT), > Changa-388421, Gujarat, INDIA. > http://www.charusat.ac.in > *Personal Website*: https://sites.google.com/a/ecchanga.ac.in/chintan/ > > [image: IBM] > <https://www.youracclaim.com/badges/880ac4e2-99ec-45f5-bc81-3479b91f185e/public_url> > > > DISCLAIMER: The information transmitted is intended only for the person or > entity to which it is addressed and may contain confidential and/or > privileged material which is the intellectual property of Charotar > University of Science & Technology (CHARUSAT). Any review, retransmission, > dissemination or other use of, or taking of any action in reliance upon > this information by persons or entities other than the intended recipient > is strictly prohibited. If you are not the intended recipient, or the > employee, or agent responsible for delivering the message to the intended > recipient and/or if you have received this in error, please contact the > sender and delete the material from the computer or device. CHARUSAT does > not take any liability or responsibility for any malicious codes/software > and/or viruses/Trojan horses that may have been picked up during the > transmission of this message. By opening and solely relying on the contents > or part thereof this message, and taking action thereof, the recipient > relieves the CHARUSAT of all the liabilities including any damages done to > the recipient's pc/laptop/peripherals and other communication devices due > to any reason. >