Re: Data from PostgreSQL to Spark

2015-07-27 Thread felixcheung_m
You can have Spark reading from PostgreSQL through the data access API. Do you 
have any concern with that approach since you mention copying that data into 
HBase.



From: Jeetendra Gangele

Sent: Monday, July 27, 6:00 AM

Subject: Data from PostgreSQL to Spark

To: user



Hi All 



I have a use case where where I am consuming the Events from RabbitMQ using 
spark streaming.This event has some fields on which I want to query the 
PostgreSQL and bring the data and then do the join between event data and 
PostgreSQl data and put the aggregated data into HDFS, so that I run run 
analytics query over this data using SparkSQL. 



my question is PostgreSQL data in production data so i don't want to hit so 
many times. 



at any given  1 seconds time I may have 3000 events,that means I need to fire 
3000 parallel query to my PostGreSQl and this data keeps on growing, so my 
database will go down. 


  


I can't migrate this PostgreSQL data since lots of system using it,but I can 
take this data to some NOSQL like base and query the Hbase, but here issue is 
How can I make sure that Hbase has upto date data? 



Any anyone suggest me best approach/ method to handle this case? 




Regards 


Jeetendra 

Re: PYSPARK_DRIVER_PYTHON="ipython" spark/bin/pyspark Does not create SparkContext

2015-07-27 Thread felixcheung_m
Hmm, it should work with you run `PYSPARK_DRIVER_PYTHON="ipython" 
spark/bin/pyspark`



PYTHONSTARTUP is a PYTHON environment variable


https://docs.python.org/2/using/cmdline.html#envvar-PYTHONSTARTUP




On Sun, Jul 26, 2015 at 4:06 PM -0700, "Zerony Zhao"  wrote:
Hello everyone,

I have a newbie question.

$SPARK_HOME/bin/pyspark will create SparkContext automatically.

Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.4.1
  /_/

Using Python version 2.7.3 (default, Jun 22 2015 19:33:41)
SparkContext available as sc, HiveContext available as sqlContext.


But When using ipython as a driver,

PYSPARK_DRIVER_PYTHON="ipython" spark/bin/pyspark

, does not create SparkContext automatically. I have to execute

execfile('spark_home/python/pyspark/shell.py')

is it by design?

I read the bash script bin/pyspark, I noticed the line:

export PYTHONSTARTUP="$SPARK_HOME/python/pyspark/shell.py"

But I searched the whole spark source code, the variable PYTHONSTARTUP is
never used, I could not understand when PYTHONSTARTUP is executed.

Thank you.