Read all the columns from a file in spark sql

2014-07-16 Thread pandees waran
Hi,

I am newbie to spark sql and i would like to know about how to read all the
columns from a file in spark sql. I have referred the programming guide
here:
http://people.apache.org/~tdas/spark-1.0-docs/sql-programming-guide.html

The example says:

val people = 
sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p
=> Person(p(0), p(1).trim.toInt))

But, instead of explicitly specifying p(0),p(1) I would like to read all
the columns from a file. It would be difficult if my source dataset has
more no of columns.

Is there any shortcut for that?

And instead of a single file, i would like to read multiple files which
shares a similar structure from a directory.

Could you please share your thoughts on this?

It would be great , if you share any documentation which has details on
these?

Thanks


Re: Read all the columns from a file in spark sql

2014-07-16 Thread Michael Armbrust
I think what you might be looking for is the ability to programmatically
specify the schema, which is coming in 1.1.

Here's the JIRA: SPARK-2179
<https://issues.apache.org/jira/browse/SPARK-2179>


On Wed, Jul 16, 2014 at 8:24 AM, pandees waran  wrote:

> Hi,
>
> I am newbie to spark sql and i would like to know about how to read all
> the columns from a file in spark sql. I have referred the programming guide
> here:
> http://people.apache.org/~tdas/spark-1.0-docs/sql-programming-guide.html
>
> The example says:
>
> val people = 
> sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p 
> => Person(p(0), p(1).trim.toInt))
>
> But, instead of explicitly specifying p(0),p(1) I would like to read all
> the columns from a file. It would be difficult if my source dataset has
> more no of columns.
>
> Is there any shortcut for that?
>
> And instead of a single file, i would like to read multiple files which
> shares a similar structure from a directory.
>
> Could you please share your thoughts on this?
>
> It would be great , if you share any documentation which has details on
> these?
>
> Thanks
>


Re: Read all the columns from a file in spark sql

2014-07-17 Thread Brad Miller
Hi Pandees,

You may also be helped by looking into the ability to read and write
Parquet files which is available in the present release.  Parquet
files allow you to store columnar data in HDFS.  At present, Spark
"infers" the schema from the Parquet file.  In pyspark, some of the
methods you'd be interested in are "parquetFile" and "inferSchema" in
SQLContext, and "saveAsParquetFile" in SchemaRDD.

Hope that helps.
-Brad

On Wed, Jul 16, 2014 at 4:31 PM, Michael Armbrust
 wrote:
> I think what you might be looking for is the ability to programmatically
> specify the schema, which is coming in 1.1.
>
> Here's the JIRA: SPARK-2179
>
>
> On Wed, Jul 16, 2014 at 8:24 AM, pandees waran  wrote:
>>
>> Hi,
>>
>> I am newbie to spark sql and i would like to know about how to read all
>> the columns from a file in spark sql. I have referred the programming guide
>> here:
>> http://people.apache.org/~tdas/spark-1.0-docs/sql-programming-guide.html
>>
>> The example says:
>>
>> val people =
>> sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p
>> => Person(p(0), p(1).trim.toInt))
>>
>> But, instead of explicitly specifying p(0),p(1) I would like to read all
>> the columns from a file. It would be difficult if my source dataset has more
>> no of columns.
>>
>> Is there any shortcut for that?
>>
>> And instead of a single file, i would like to read multiple files which
>> shares a similar structure from a directory.
>>
>> Could you please share your thoughts on this?
>>
>> It would be great , if you share any documentation which has details on
>> these?
>>
>> Thanks
>
>