Re: redshift spark

2015-06-17 Thread Xiangrui Meng
Hi Hafiz, As Ewan mentioned, the path is the path to the S3 files unloaded from Redshift. This is a more scalable way to get a large amount of data from Redshift than via JDBC. I'd recommend using the SQL API instead of the Hadoop API (https://github.com/databricks/spark-redshift). Best,

RE: redshift spark

2015-06-05 Thread Ewan Leith
That project is for reading data in from Redshift table exports stored in s3 by running commands in redshift like this: unload ('select * from venue') to 's3://mybucket/tickit/unload/' http://docs.aws.amazon.com/redshift/latest/dg/t_Unloading_tables.html The path in the parameters below is