Fwd: Problems with TeraValidate

lonely Feb Fri, 16 Jan 2015 03:12:05 -0800

+spark-user

---------- Forwarded message ----------
From: lonely Feb <lonely8...@gmail.com>
Date: 2015-01-16 19:09 GMT+08:00
Subject: Re: Problems with TeraValidate
To: Ewan Higgs <ewan.hi...@ugent.be>



thx a lot.
btw, here is my output:

1. when dataset is 1000g:
num records: 10000000000
checksum: 12aa5028310ea763e
part 0
lastMaxArrayBuffer(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
min ArrayBuffer(0, 4, 25, 150, 6, 136, 39, 39, 214, 164)
max ArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113)
part 1
lastMaxArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113)
min ArrayBuffer(0, 4, 25, 150, 6, 136, 39, 39, 214, 164)
max ArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113)
Exception in thread "main" java.lang.AssertionError: assertion failed:
current partition min < last partition max
        at scala.Predef$.assert(Predef.scala:179)
        at
org.apache.spark.examples.terasort.TeraValidate$$anonfun$validate$3.apply(TeraValidate.scala:117)
        at
org.apache.spark.examples.terasort.TeraValidate$$anonfun$validate$3.apply(TeraValidate.scala:111)
        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at
org.apache.spark.examples.terasort.TeraValidate$.validate(TeraValidate.scala:111)
        at
org.apache.spark.examples.terasort.TeraValidate$.main(TeraValidate.scala:59)
        at
org.apache.spark.examples.terasort.TeraValidate.main(TeraValidate.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

2. when dataset is 200m:
um records: 2000000
checksum: ca93e5d2fad40
part 0
lastMaxArrayBuffer(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
min ArrayBuffer(82, 24, 27, 218, 62, 68, 174, 208, 69, 78)
max ArrayBuffer(146, 177, 217, 195, 175, 144, 239, 81, 29, 252)
part 1
lastMaxArrayBuffer(146, 177, 217, 195, 175, 144, 239, 81, 29, 252)
min ArrayBuffer(82, 24, 27, 218, 62, 68, 174, 208, 69, 78)
max ArrayBuffer(146, 177, 217, 195, 175, 144, 239, 81, 29, 252)
Exception in thread "main" java.lang.AssertionError: assertion failed:
current partition min < last partition max
        at scala.Predef$.assert(Predef.scala:179)
        at
org.apache.spark.examples.terasort.TeraValidate$$anonfun$validate$3.apply(TeraValidate.scala:117)
        at
org.apache.spark.examples.terasort.TeraValidate$$anonfun$validate$3.apply(TeraValidate.scala:111)
        at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
        at
org.apache.spark.examples.terasort.TeraValidate$.validate(TeraValidate.scala:111)
        at
org.apache.spark.examples.terasort.TeraValidate$.main(TeraValidate.scala:59)
        at
org.apache.spark.examples.terasort.TeraValidate.main(TeraValidate.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I suspect sth. is wrong with the function "clone".

2015-01-16 19:02 GMT+08:00 Ewan Higgs <ewan.hi...@ugent.be>:

> Hi Ionely,
> I am looking at this now. If you need to validate a terasort benchmark as
> soon as possible, I would use Hadoop's TeraValidate.
>
> I'll let you know when I have a fix.
>
> Yours,
> Ewan Higgs
>
>
> On 16/01/15 09:47, lonely Feb wrote:
>
>> Hi i run your terasort program on my spark cluster, when the dataset is
>> small (below 1000g) everything goes fine, but when the dataset is over
>> 1000g, the TeraValidate always assert error with:
>> current partition min < last partition max
>>
>> eg. output is :
>> num records: 10000000000
>> checksum: 12aa5028310ea763e
>> part 0
>> lastMaxArrayBuffer(0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
>> min ArrayBuffer(0, 4, 25, 150, 6, 136, 39, 39, 214, 164)
>> max ArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113)
>> part 1
>> lastMaxArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113)
>> min ArrayBuffer(0, 4, 25, 150, 6, 136, 39, 39, 214, 164)
>> max ArrayBuffer(255, 255, 96, 244, 80, 50, 31, 158, 43, 113)
>> Exception in thread "main" java.lang.AssertionError: assertion failed:
>> current partition min < last partition max
>>         at scala.Predef$.assert(Predef.scala:179)
>>         at org.apache.spark.examples.terasort.TeraValidate$$
>> anonfun$validate$3.apply(TeraValidate.scala:117)
>>         at org.apache.spark.examples.terasort.TeraValidate$$
>> anonfun$validate$3.apply(TeraValidate.scala:111)
>>         at scala.collection.IndexedSeqOptimized$class.
>> foreach(IndexedSeqOptimized.scala:33)
>>         at scala.collection.mutable.ArrayOps$ofRef.foreach(
>> ArrayOps.scala:108)
>>         at org.apache.spark.examples.terasort.TeraValidate$.
>> validate(TeraValidate.scala:111)
>>         at org.apache.spark.examples.terasort.TeraValidate$.main(
>> TeraValidate.scala:59)
>>         at org.apache.spark.examples.terasort.TeraValidate.main(
>> TeraValidate.scala)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke(
>> NativeMethodAccessorImpl.java:57)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(
>> DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:616)
>>         at org.apache.spark.deploy.SparkSubmit$.launch(
>> SparkSubmit.scala:329)
>>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.
>> scala:75)
>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>> what's the problem?
>>
>
>

Fwd: Problems with TeraValidate

Reply via email to