Re: running the Terasort example
On 12/16/14, 11:42 PM, "Ewan Higgs" wrote: >Hi Tim, > >> On 16 Dec 2014, at 19:27, Tim Harsch wrote: >> >> Hi Ewan, >> Thanks, I think I was just a bit confused at the time, I was looking at >> the spark-perf repo when there was the problem (uh.. ok)… >> >The PR that I am working on is indeed for spark-perf. Yes but the example usage you gave, is for the code in ehiggs/spark (which is where I got myself confused) ? git remote show origin * remote origin Fetch URL: g...@github.com:ehiggs/spark.git Push URL: g...@github.com:ehiggs/spark.git … ? ll bin/run-example -rwxr-xr-x 1 tharsch 513 2.1K Dec 11 21:02 bin/run-example run-example is not in spark-perf, What is the expected usage, for the code that is in spark-perf? I’m hoping I’ll have time to run it later today, so hopefully I will figure it out on my own. > > >> …snip... >> >> >> I can get past this by setting hadoop.version to 2.5.0 in the parent >>pom. >> >I wasn’t sure how to get this working across all the Hadoop versions so I >made it work with 2.4.0 and above. If you have advice on back porting >this then I’m happy to implement it. I would like to try, hopefully I can find the time. > >NB, TeraValidate may not be functioning appropriately. If you have >trouble with it, I recommend using the Hadoop version. Thanks for the warning, I bet I could have banged my head on that for hours. > >Yours, >Ewan > >> Thanks, >> Tim >> >> >> On 12/16/14, 12:38 AM, "Ewan Higgs" wrote: >> >>> Hi Tim, >>> run-example is here: >>> https://github.com/ehiggs/spark/blob/terasort/bin/run-example >>> >>> It should be in the repository that you cloned. So if you were at the >>> top level of the checkout, run-example would be run as >>>./bin/run-example. >>> >>> Yours, >>> Ewan Higgs >>> >>> On 12/12/14 01:06, Tim Harsch wrote: >>>> Hi all, >>>> I just joined the list, so I don¹t have a message history that would >>>> allow >>>> me to reply to this post: >>>> >>>> >>>>http://apache-spark-developers-list.1001551.n3.nabble.com/Terasort-exam >>>>pl >>>> e- >>>> td9284.html >>>> >>>> I am interested in running the terasort example. I cloned the repo >>>> https://github.com/ehiggs/spark and did checkout of the terasort >>>>branch. >>>> In the above referenced post Ewan gives the example >>>> >>>> # Generate 1M 100 byte records: >>>> ./bin/run-example terasort.TeraGen 100M ~/data/terasort_in >>>> >>>> >>>> I don¹t see a ³run-example² in that repo. I¹m sure I am missing >>>> something >>>> basic, or less likely, maybe some changes weren¹t pushed? >>>> >>>> Thanks for any help, >>>> Tim >>>> >>>> >>>> - >>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: dev-h...@spark.apache.org >>>> >>> >> >
Re: running the Terasort example
Hi Ewan, Thanks, I think I was just a bit confused at the time, I was looking at the spark-perf repo when there was the problem (uh.. ok)… I notice now with a pull down just minutes back that I still get a compile problem. [ERROR] /Users/tharsch/git/ehiggs/spark/examples/src/main/scala/org/apache/spark/ex amples/terasort/TeraInputFormat.scala:40: object task is not a member of package org.apache.hadoop.mapreduce [ERROR] import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl [ERROR]^ [ERROR] /Users/tharsch/git/ehiggs/spark/examples/src/main/scala/org/apache/spark/ex amples/terasort/TeraInputFormat.scala:132: not found: type TaskAttemptContextImpl [ERROR] val context = new TaskAttemptContextImpl( [ERROR] ^ [ERROR] /Users/tharsch/git/ehiggs/spark/examples/src/main/scala/org/apache/spark/ex amples/terasort/TeraOutputFormat.scala:76: value hsync is not a member of org.apache.hadoop.fs.FSDataOutputStream [ERROR] out.hsync(); [ERROR] ^ I can get past this by setting hadoop.version to 2.5.0 in the parent pom. Thanks, Tim On 12/16/14, 12:38 AM, "Ewan Higgs" wrote: >Hi Tim, >run-example is here: >https://github.com/ehiggs/spark/blob/terasort/bin/run-example > >It should be in the repository that you cloned. So if you were at the >top level of the checkout, run-example would be run as ./bin/run-example. > >Yours, >Ewan Higgs > >On 12/12/14 01:06, Tim Harsch wrote: >> Hi all, >> I just joined the list, so I don¹t have a message history that would >>allow >> me to reply to this post: >> >>http://apache-spark-developers-list.1001551.n3.nabble.com/Terasort-exampl >>e- >> td9284.html >> >> I am interested in running the terasort example. I cloned the repo >> https://github.com/ehiggs/spark and did checkout of the terasort branch. >> In the above referenced post Ewan gives the example >> >> # Generate 1M 100 byte records: >>./bin/run-example terasort.TeraGen 100M ~/data/terasort_in >> >> >> I don¹t see a ³run-example² in that repo. I¹m sure I am missing >>something >> basic, or less likely, maybe some changes weren¹t pushed? >> >> Thanks for any help, >> Tim >> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> >
running the Terasort example
Hi all, I just joined the list, so I don¹t have a message history that would allow me to reply to this post: http://apache-spark-developers-list.1001551.n3.nabble.com/Terasort-example- td9284.html I am interested in running the terasort example. I cloned the repo https://github.com/ehiggs/spark and did checkout of the terasort branch. In the above referenced post Ewan gives the example # Generate 1M 100 byte records: ./bin/run-example terasort.TeraGen 100M ~/data/terasort_in I don¹t see a ³run-example² in that repo. I¹m sure I am missing something basic, or less likely, maybe some changes weren¹t pushed? Thanks for any help, Tim - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org