Hello Kevin,

You can take a look at our generic load function
<https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#generic-loadsave-functions>
.

For example, you can use

val df = sqlContext.load("/myData", "parquet")

To load a parquet dataset stored in "/myData" as a DataFrame
<https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#dataframes>.

You can use it to load data stored in various formats, like json (Spark
built-in), parquet (Spark built-in), avro
<https://github.com/databricks/spark-avro>, and csv
<https://github.com/databricks/spark-csv>.

Thanks,

Yin

On Mon, Mar 23, 2015 at 7:14 PM, Dai, Kevin <yun...@ebay.com> wrote:

>  Hi, Paul
>
>
>
> You are right.
>
>
>
> The story is that we have a lot of pig load function to load our different
> data.
>
>
>
> And now we want to use spark to read and process these data.
>
>
>
> So we want to figure out a way to reuse our existing load function in
> spark to read these data.
>
>
>
> Any idea?
>
>
>
> Best Regards,
>
> Kevin.
>
>
>
> *From:* Paul Brown [mailto:p...@mult.ifario.us]
> *Sent:* 2015年3月24日 4:11
> *To:* Dai, Kevin
> *Subject:* Re: Use pig load function in spark
>
>
>
>
>
> The answer is "Maybe, but you probably don't want to do that.".
>
>
>
> A typical Pig load function is devoted to bridging external data into
> Pig's type system, but you don't really need to do that in Spark because it
> is (thankfully) not encumbered by Pig's type system.  What you probably
> want to do is to figure out a way to use native Spark facilities (e.g.,
> textFile) coupled with some of the logic out of your Pig load function
> necessary to turn your external data into an RDD.
>
>
>
>
>   —
> p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/
>
>
>
> On Mon, Mar 23, 2015 at 2:29 AM, Dai, Kevin <yun...@ebay.com> wrote:
>
> Hi, all
>
>
>
> Can spark use pig’s load function to load data?
>
>
>
> Best Regards,
>
> Kevin.
>
>
>

Reply via email to