Hi everyone,

I tried to manipulate MySQL tables from spark, I do not want to move these 
tables from MySQL to spark, as these tables can easily get very big. It is 
ideal that the data stays in the database where it was stored. For me, spark is 
only used to speed up the read and write process (as I am more a data analyst 
rather than an application developer). So I did not install hadoop. People here 
have helped me a lot, but I still cannot connect MySQL to spark, possible 
reasons are, for instance, java version, java files location, connector files 
location, MySQL version, environment variable location, the use of jdbc or 
odbc, and so on. My questions are:

1. Do we need to install hadoop and java before installing spark?

2. Which version of each of these package are stable for successful 
installation and connection, if anyone had any possible experience? (the 
solutions online might worked on older version of these packages, but seems not 
working anymore in my case, I’m on mac by the way).

3. So far, the only way I tried successfully is to utilize the sqldf package on 
SparkR to connect MySQL, but does it mean that spark is working (to speed up 
the process) when I run the sql queries with sqldf package on SparkR? 

I hope I described my questions clearly. Thank you very much for the help.

Best regards,

YA

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to