+spark-user ---------- Forwarded message ---------- From: lonely Feb <lonely8...@gmail.com> Date: 2015-01-16 19:09 GMT+08:00 Subject: Re: Problems with TeraValidate To: Ewan Higgs <ewan.hi...@ugent.be>
thx a lot. btw, here is my output: 1. when dataset is 1000g: num records: 10000000000 checksum: 12aa5028310ea763e part 0 lastMaxArrayBuffer(0, 0, 0, 0, 0, 0, 0, 0, 0, 0) min ArrayBuffer(0, 4, 25, 150, 6, 136, 39, 39, 214, 164) max ArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113) part 1 lastMaxArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113) min ArrayBuffer(0, 4, 25, 150, 6, 136, 39, 39, 214, 164) max ArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113) Exception in thread "main" java.lang.AssertionError: assertion failed: current partition min < last partition max at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.examples.terasort.TeraValidate$$anonfun$validate$3.apply(TeraValidate.scala:117) at org.apache.spark.examples.terasort.TeraValidate$$anonfun$validate$3.apply(TeraValidate.scala:111) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.examples.terasort.TeraValidate$.validate(TeraValidate.scala:111) at org.apache.spark.examples.terasort.TeraValidate$.main(TeraValidate.scala:59) at org.apache.spark.examples.terasort.TeraValidate.main(TeraValidate.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 2. when dataset is 200m: um records: 2000000 checksum: ca93e5d2fad40 part 0 lastMaxArrayBuffer(0, 0, 0, 0, 0, 0, 0, 0, 0, 0) min ArrayBuffer(82, 24, 27, 218, 62, 68, 174, 208, 69, 78) max ArrayBuffer(146, 177, 217, 195, 175, 144, 239, 81, 29, 252) part 1 lastMaxArrayBuffer(146, 177, 217, 195, 175, 144, 239, 81, 29, 252) min ArrayBuffer(82, 24, 27, 218, 62, 68, 174, 208, 69, 78) max ArrayBuffer(146, 177, 217, 195, 175, 144, 239, 81, 29, 252) Exception in thread "main" java.lang.AssertionError: assertion failed: current partition min < last partition max at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.examples.terasort.TeraValidate$$anonfun$validate$3.apply(TeraValidate.scala:117) at org.apache.spark.examples.terasort.TeraValidate$$anonfun$validate$3.apply(TeraValidate.scala:111) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.examples.terasort.TeraValidate$.validate(TeraValidate.scala:111) at org.apache.spark.examples.terasort.TeraValidate$.main(TeraValidate.scala:59) at org.apache.spark.examples.terasort.TeraValidate.main(TeraValidate.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) I suspect sth. is wrong with the function "clone". 2015-01-16 19:02 GMT+08:00 Ewan Higgs <ewan.hi...@ugent.be>: > Hi Ionely, > I am looking at this now. If you need to validate a terasort benchmark as > soon as possible, I would use Hadoop's TeraValidate. > > I'll let you know when I have a fix. > > Yours, > Ewan Higgs > > > On 16/01/15 09:47, lonely Feb wrote: > >> Hi i run your terasort program on my spark cluster, when the dataset is >> small (below 1000g) everything goes fine, but when the dataset is over >> 1000g, the TeraValidate always assert error with: >> current partition min < last partition max >> >> eg. output is : >> num records: 10000000000 >> checksum: 12aa5028310ea763e >> part 0 >> lastMaxArrayBuffer(0, 0, 0, 0, 0, 0, 0, 0, 0, 0) >> min ArrayBuffer(0, 4, 25, 150, 6, 136, 39, 39, 214, 164) >> max ArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113) >> part 1 >> lastMaxArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113) >> min ArrayBuffer(0, 4, 25, 150, 6, 136, 39, 39, 214, 164) >> max ArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113) >> Exception in thread "main" java.lang.AssertionError: assertion failed: >> current partition min < last partition max >> at scala.Predef$.assert(Predef.scala:179) >> at org.apache.spark.examples.terasort.TeraValidate$$ >> anonfun$validate$3.apply(TeraValidate.scala:117) >> at org.apache.spark.examples.terasort.TeraValidate$$ >> anonfun$validate$3.apply(TeraValidate.scala:111) >> at scala.collection.IndexedSeqOptimized$class. >> foreach(IndexedSeqOptimized.scala:33) >> at scala.collection.mutable.ArrayOps$ofRef.foreach( >> ArrayOps.scala:108) >> at org.apache.spark.examples.terasort.TeraValidate$. >> validate(TeraValidate.scala:111) >> at org.apache.spark.examples.terasort.TeraValidate$.main( >> TeraValidate.scala:59) >> at org.apache.spark.examples.terasort.TeraValidate.main( >> TeraValidate.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke( >> NativeMethodAccessorImpl.java:57) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke( >> DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:616) >> at org.apache.spark.deploy.SparkSubmit$.launch( >> SparkSubmit.scala:329) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit. >> scala:75) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> >> what's the problem? >> > >