RE: Pig on Spark

Sameer Tilak Mon, 10 Mar 2014 12:54:09 -0700

Hi Mayur,We are planning to upgrade our distribution MR1> MR2 (YARN) and the 
goal is to get SPROK set up next month. I will keep you posted. Can you please 
keep me informed about your progress as well.
From: mayur.rust...@gmail.com
Date: Mon, 10 Mar 2014 11:47:56 -0700
Subject: Re: Pig on Spark
To: user@spark.apache.org


Hi Sameer,Did you make any progress on this. My team is also trying it out 
would love to know some detail so progress. Mayur Rustagi


Ph: +1 (760) 203 3257http://www.sigmoidanalytics.com@mayur_rustagi





On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak <ssti...@live.com> wrote:





Hi Aniket,Many thanks! I will check this out.

Date: Thu, 6 Mar 2014 13:46:50 -0800
Subject: Re: Pig on Spark
From: aniket...@gmail.com


To: user@spark.apache.org; tgraves...@yahoo.com

There is some work to make this work on yarn at 
https://github.com/aniket486/pig. (So, compile pig with ant -Dhadoopversion=23)


You can look at https://github.com/aniket486/pig/blob/spork/pig-spark to find 
out what sort of env variables you need (sorry, I haven't been able to clean 
this up- in-progress). There are few known issues with this, I will work on 
fixing them soon.



Known issues-1. Limit does not work (spork-fix)2. Foreach requires to turn off 
schema-tuple-backend (should be a pig-jira)3. Algebraic udfs dont work 
(spork-fix in-progress)


4. Group by rework (to avoid OOMs)5. UDF Classloader issue (requires 
SPARK-1053, then you can put pig-withouthadoop.jar as SPARK_JARS in 
SparkContext along with udf jars)
~Aniket






On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves <tgraves...@yahoo.com> wrote:


I had asked a similar question on the dev mailing list a while back (Jan 22nd). 



See the archives: 
http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser -> look 
for spork.






Basically Matei said:



Yup, that was it, though I believe people at Twitter picked it up again 
recently. I’d suggest
asking Dmitriy if you know him. I’ve seen interest in this from several other 
groups, and
if there’s enough of it, maybe we can start another open source repo to track 
it. The work
in that repo you pointed to was done over one week, and already had most of 
Pig’s operators
working. (I helped out with this prototype over Twitter’s hack week.) That work 
also calls
the Scala API directly, because it was done before we had a Java API; it should 
be easier
with the Java one.
Tom 
 
 


    On Thursday, March 6, 2014 3:11 PM, Sameer Tilak <ssti...@live.com> wrote:



    


Hi everyone,



We are using to Pig to build our data pipeline. I came across Spork -- Pig on 
Spark at: https://github.com/dvryaboy/pig and not sure if it is still active.   


Can someone please let me know the status of Spork or any other effort that 
will let us run Pig on Spark? We can significantly benefit by using Spark, but 
we would like to keep using the existing Pig scripts.                           
               





      

-- 
"...:::Aniket:::... Quetzalco@tl"

RE: Pig on Spark

Reply via email to