Contributing to Spark in GSoC 2017
Hello, I am Krishna, currently a 2nd year Masters student in (MSc. in Data Mining) currently in Barcelona studying at Université Polytechnique de Catalogne. I know its a little early for GSoC, however I wanted to get a head start working with the spark community. Is there anyone who would be mentoring GSoC 2017?. Could anyone please guide on how to go about it?. Related Experience: My masters is mostly focussed on data mining and machine learning techniques. Before my masters, I was a data engineer with IBM (India). I was responsible for managing 50 node Hadoop Cluster for more than a year. Most of my time was spent optimising and writing ETL (Apache Pig) jobs. Our daily batch job aggregated more than 30gbs of CDR+Weblogs in our cluster. I am the most comfortable with Python and R. (Not a Scala expert, I am sure that I can pick it up quickly) My CV could be viewed by following the link below. (https://github.com/krishnakalyan3/Resume/raw/master/Resume.pdf) My Spark Pull Requests ( https://github.com/apache/spark/pulls?utf8=%E2%9C%93=is%3Apr%20author%3Akrishnakalyan3%20 ) Thank you so much, Krishna
Re: Running Unit Tests in pyspark failure
I could resolve this by passing the argument below ./python/run-tests --python-executables=python2.7 Thanks, Krishna On Thu, Nov 3, 2016 at 4:16 PM, Krishna Kalyan <krishnakaly...@gmail.com> wrote: > Hello, > I am trying to run unit tests on pyspark. > > When I try to run unit test I am faced with errors. > krishna@Krishna:~/Experiment/spark$ ./python/run-tests > Running PySpark tests. Output is in /Users/krishna/Experiment/spar > k/python/unit-tests.log > Will test against the following Python executables: ['python2.6'] > Will test the following Python modules: ['pyspark-core', 'pyspark-ml', > 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] > Please install unittest2 to test with Python 2.6 or earlier > Had test failures in pyspark.sql.tests with python2.6; see logs. > > and when I try to Install unittest2, It says requirement already satisfied. > > krishna@Krishna:~/Experiment/spark$ sudo pip install --upgrade unittest2 > Password: > Requirement already up-to-date: unittest2 in /usr/local/lib/python2.7/site- > packages > Requirement already up-to-date: argparse in > /usr/local/lib/python2.7/site-packages > (from unittest2) > Requirement already up-to-date: six>=1.4 in > /usr/local/lib/python2.7/site-packages > (from unittest2) > Requirement already up-to-date: traceback2 in > /usr/local/lib/python2.7/site-packages (from unittest2) > Requirement already up-to-date: linecache2 in > /usr/local/lib/python2.7/site-packages (from traceback2->unittest2) > > Help! > > Thanks, > Krishna > > > > >
Running Unit Tests in pyspark failure
Hello, I am trying to run unit tests on pyspark. When I try to run unit test I am faced with errors. krishna@Krishna:~/Experiment/spark$ ./python/run-tests Running PySpark tests. Output is in /Users/krishna/Experiment/ spark/python/unit-tests.log Will test against the following Python executables: ['python2.6'] Will test the following Python modules: ['pyspark-core', 'pyspark-ml', 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming'] Please install unittest2 to test with Python 2.6 or earlier Had test failures in pyspark.sql.tests with python2.6; see logs. and when I try to Install unittest2, It says requirement already satisfied. krishna@Krishna:~/Experiment/spark$ sudo pip install --upgrade unittest2 Password: Requirement already up-to-date: unittest2 in /usr/local/lib/python2.7/site- packages Requirement already up-to-date: argparse in /usr/local/lib/python2.7/site-packages (from unittest2) Requirement already up-to-date: six>=1.4 in /usr/local/lib/python2.7/site-packages (from unittest2) Requirement already up-to-date: traceback2 in /usr/local/lib/python2.7/site-packages (from unittest2) Requirement already up-to-date: linecache2 in /usr/local/lib/python2.7/site-packages (from traceback2->unittest2) Help! Thanks, Krishna
Contributing to PySpark
Hello, I am a masters student. Could someone please let me know how set up my dev working environment to contribute to pyspark. Questions I had were: a) Should I use Intellij Idea or PyCharm?. b) How do I test my changes?. Regards, Krishna