Not sure if this is the place to ask, but i am using the terasort branche of Spark for benchmarking, as found on https://github.com/ehiggs/spark/tree/terasort, and I get the error below when running on two machines (one machine works just fine). When looking at the code, listed below the error message, I see "while (read < TeraInputFormat.RECORD_LEN) {" - Is it possible that this restricts the branch from running on a cluster? - Did anybody manage to run this branch on a cluster?
Thanks, Tom 15/02/25 17:55:42 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, arlab152): org.apache.hadoop.fs.ChecksumException: Checksum error: file:/home/th/terasort_in/part-r-00000 at 49999872 exp: 1592400191 got: -1117747586 at org.apache.hadoop.fs.FSInputChecker.verifySums(FSInputChecker.java:322) at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:278) at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:213) at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:231) at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:195) at java.io.DataInputStream.read(DataInputStream.java:161) at org.apache.spark.examples.terasort.TeraInputFormat$TeraRecordReader.nextKeyValue(TeraInputFormat.scala:91) Code: override def nextKeyValue() : Boolean = { if (offset >= length) { return false } var read : Int = 0 while (read < TeraInputFormat.RECORD_LEN) { var newRead : Int = in.read(buffer, read, TeraInputFormat.RECORD_LEN - read) if (newRead == -1) { if (read == 0) false else throw new EOFException("read past eof") } read += newRead } if (key == null) { key = new Array[Byte](TeraInputFormat.KEY_LEN) } if (value == null) { value = new Array[Byte](TeraInputFormat.VALUE_LEN) } buffer.copyToArray(key, 0, TeraInputFormat.KEY_LEN) buffer.takeRight(TeraInputFormat.VALUE_LEN).copyToArray(value, 0, TeraInputFormat.VALUE_LEN) offset += TeraInputFormat.RECORD_LEN true } -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-when-running-the-terasort-branche-in-a-cluster-tp21808.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org