Re: reading data from s3

2016-12-09 Thread Sudev A C
Hi Hitesh,

Schema of the table is inferred automatically if you are reading from JSON
file, wherein when you are reading from a text file you will have to
provide a schema for the table you want to create (JSON has schema within
it).

You can create a data frames and register them as tables.
1. Inferring schema using reflection

2. Programmatically Specifying the Schema


Also you may use packages like spark-csv to infer the schema from CSV.
https://github.com/databricks/spark-csv#sql-api

Thanks
Sudev

On Fri, Dec 9, 2016 at 11:13 AM Hitesh Goyal 
wrote:

Hi team,

I want to read the text file from s3. I am doing it using DataFrame. Like
below:-

DataFrame d=sql.read().text("s3://my_first_text_file.txt");

  d.registerTempTable("table1");

  DataFrame d1=sql.sql("Select * from table1");

  d1.printSchema();

  d1.show();



But it is not registering the text file as a temp table so that I  can make
SQL  queries on that. Can’t I do this on a text file ?? Or if I can,
suggest any way to do.

Like if I try to do it by JSON file, it is successful.



Regards,

*Hitesh Goyal*

Simpli5d Technologies

Cont No.: 9996588220


reading data from s3

2016-12-08 Thread Hitesh Goyal
Hi team,
I want to read the text file from s3. I am doing it using DataFrame. Like 
below:-
DataFrame d=sql.read().text("s3://my_first_text_file.txt");
  d.registerTempTable("table1");
  DataFrame d1=sql.sql("Select * from table1");
  d1.printSchema();
  d1.show();

But it is not registering the text file as a temp table so that I  can make SQL 
 queries on that. Can't I do this on a text file ?? Or if I can, suggest any 
way to do.
Like if I try to do it by JSON file, it is successful.

Regards,
Hitesh Goyal
Simpli5d Technologies
Cont No.: 9996588220



Re: Directly reading data from S3 to EC2 with PySpark

2015-09-15 Thread ayan guha
Also you can set hadoop conf through jsc.hadoopConf property. Do a dir (sc)
to see exact property name
On 15 Sep 2015 22:43, "Gourav Sengupta"  wrote:

> Hi,
>
> If you start your EC2 nodes with correct roles (default in most cases
> depending on your needs) you should be able to work on S3 and all other AWS
> resources without giving any keys.
>
> I have been doing that for some time now and I have not faced any issues
> yet.
>
>
> Regards,
> Gourav
>
>
>
> On Tue, Sep 15, 2015 at 12:54 PM, Cazen  wrote:
>
>> Good day junHyeok
>>
>> Did you set HADOOP_CONF_DIR? It seems that spark cannot find AWS key
>> properties
>>
>> If it doesn't work after set, How about export AWS_ACCESS_KEY_ID,
>> AWS_SECRET_ACCESS_KEY before running py-spark shell?
>>
>> BR
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Directly-reading-data-from-S3-to-EC2-with-PySpark-tp24638p24698.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>


Re: Directly reading data from S3 to EC2 with PySpark

2015-09-15 Thread Gourav Sengupta
Hi,

If you start your EC2 nodes with correct roles (default in most cases
depending on your needs) you should be able to work on S3 and all other AWS
resources without giving any keys.

I have been doing that for some time now and I have not faced any issues
yet.


Regards,
Gourav



On Tue, Sep 15, 2015 at 12:54 PM, Cazen  wrote:

> Good day junHyeok
>
> Did you set HADOOP_CONF_DIR? It seems that spark cannot find AWS key
> properties
>
> If it doesn't work after set, How about export AWS_ACCESS_KEY_ID,
> AWS_SECRET_ACCESS_KEY before running py-spark shell?
>
> BR
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Directly-reading-data-from-S3-to-EC2-with-PySpark-tp24638p24698.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Directly reading data from S3 to EC2 with PySpark

2015-09-15 Thread Cazen
Good day junHyeok

Did you set HADOOP_CONF_DIR? It seems that spark cannot find AWS key
properties

If it doesn't work after set, How about export AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY before running py-spark shell?

BR



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Directly-reading-data-from-S3-to-EC2-with-PySpark-tp24638p24698.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org