Re: Data from PostgreSQL to Spark
You can have Spark reading from PostgreSQL through the data access API. Do you have any concern with that approach since you mention copying that data into HBase. From: Jeetendra Gangele Sent: Monday, July 27, 6:00 AM Subject: Data from PostgreSQL to Spark To: user Hi All I have a use case where where I am consuming the Events from RabbitMQ using spark streaming.This event has some fields on which I want to query the PostgreSQL and bring the data and then do the join between event data and PostgreSQl data and put the aggregated data into HDFS, so that I run run analytics query over this data using SparkSQL. my question is PostgreSQL data in production data so i don't want to hit so many times. at any given 1 seconds time I may have 3000 events,that means I need to fire 3000 parallel query to my PostGreSQl and this data keeps on growing, so my database will go down. I can't migrate this PostgreSQL data since lots of system using it,but I can take this data to some NOSQL like base and query the Hbase, but here issue is How can I make sure that Hbase has upto date data? Any anyone suggest me best approach/ method to handle this case? Regards Jeetendra
Re: PYSPARK_DRIVER_PYTHON="ipython" spark/bin/pyspark Does not create SparkContext
Hmm, it should work with you run `PYSPARK_DRIVER_PYTHON="ipython" spark/bin/pyspark` PYTHONSTARTUP is a PYTHON environment variable https://docs.python.org/2/using/cmdline.html#envvar-PYTHONSTARTUP On Sun, Jul 26, 2015 at 4:06 PM -0700, "Zerony Zhao" wrote: Hello everyone, I have a newbie question. $SPARK_HOME/bin/pyspark will create SparkContext automatically. Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 1.4.1 /_/ Using Python version 2.7.3 (default, Jun 22 2015 19:33:41) SparkContext available as sc, HiveContext available as sqlContext. But When using ipython as a driver, PYSPARK_DRIVER_PYTHON="ipython" spark/bin/pyspark , does not create SparkContext automatically. I have to execute execfile('spark_home/python/pyspark/shell.py') is it by design? I read the bash script bin/pyspark, I noticed the line: export PYTHONSTARTUP="$SPARK_HOME/python/pyspark/shell.py" But I searched the whole spark source code, the variable PYTHONSTARTUP is never used, I could not understand when PYTHONSTARTUP is executed. Thank you.