Hi Samsudhin,
If possible, can you please provide a part of the code? Or perhaps try with
the ut in RandomForestSuite to see if the issue repros.
Regards,
yuhao
-Original Message-
From: samsudhin [mailto:samsud...@pigstick.com]
Sent: Tuesday, June 23, 2015 2:14 PM
To: user@spark.apache.org
Subject: MLLIB - Storing the Trained Model
HI All,
I was trying to store a trained model to the local hard disk. i am able to save
it using save() function. while i am trying to retrieve the stored model using
load() function i am end up with following error. kindly help me on this.
scala val sameModel =
scala RandomForestModel.load(sc,/home/ec2-user/myModel)
15/06/23 02:04:25 INFO MemoryStore: ensureFreeSpace(255260) called with
curMem=592097, maxMem=278302556
15/06/23 02:04:25 INFO MemoryStore: Block broadcast_6 stored as values in
memory (estimated size 249.3 KB, free 264.6 MB)
15/06/23 02:04:25 INFO MemoryStore: ensureFreeSpace(36168) called with
curMem=847357, maxMem=278302556
15/06/23 02:04:25 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in
memory (estimated size 35.3 KB, free 264.6 MB)
15/06/23 02:04:25 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on
localhost:42290 (size: 35.3 KB, free: 265.3 MB)
15/06/23 02:04:25 INFO BlockManagerMaster: Updated info of block
broadcast_6_piece0
15/06/23 02:04:25 INFO SparkContext: Created broadcast 6 from textFile at
modelSaveLoad.scala:125
15/06/23 02:04:25 INFO FileInputFormat: Total input paths to process : 1
15/06/23 02:04:25 INFO SparkContext: Starting job: first at
modelSaveLoad.scala:125
15/06/23 02:04:25 INFO DAGScheduler: Got job 3 (first at
modelSaveLoad.scala:125) with 1 output partitions (allowLocal=true)
15/06/23 02:04:25 INFO DAGScheduler: Final stage: Stage 3(first at
modelSaveLoad.scala:125)
15/06/23 02:04:25 INFO DAGScheduler: Parents of final stage: List()
15/06/23 02:04:25 INFO DAGScheduler: Missing parents: List()
15/06/23 02:04:25 INFO DAGScheduler: Submitting Stage 3
(/home/ec2-user/myModel/metadata MapPartitionsRDD[7] at textFile at
modelSaveLoad.scala:125), which has no missing parents
15/06/23 02:04:25 INFO MemoryStore: ensureFreeSpace(2680) called with
curMem=883525, maxMem=278302556
15/06/23 02:04:25 INFO MemoryStore: Block broadcast_7 stored as values in
memory (estimated size 2.6 KB, free 264.6 MB)
15/06/23 02:04:25 INFO MemoryStore: ensureFreeSpace(1965) called with
curMem=886205, maxMem=278302556
15/06/23 02:04:25 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in
memory (estimated size 1965.0 B, free 264.6 MB)
15/06/23 02:04:25 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on
localhost:42290 (size: 1965.0 B, free: 265.3 MB)
15/06/23 02:04:25 INFO BlockManagerMaster: Updated info of block
broadcast_7_piece0
15/06/23 02:04:25 INFO SparkContext: Created broadcast 7 from broadcast at
DAGScheduler.scala:839
15/06/23 02:04:25 INFO DAGScheduler: Submitting 1 missing tasks from Stage 3
(/home/ec2-user/myModel/metadata MapPartitionsRDD[7] at textFile at
modelSaveLoad.scala:125)
15/06/23 02:04:25 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
15/06/23 02:04:25 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3,
localhost, PROCESS_LOCAL, 1311 bytes)
15/06/23 02:04:25 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
15/06/23 02:04:25 INFO HadoopRDD: Input split:
file:/home/ec2-user/myModel/metadata/part-0:0+97
15/06/23 02:04:25 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3).
1989 bytes result sent to driver
15/06/23 02:04:25 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID
3) in 10 ms on localhost (1/1)
15/06/23 02:04:25 INFO DAGScheduler: Stage 3 (first at
modelSaveLoad.scala:125) finished in 0.010 s
15/06/23 02:04:25 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have
all completed, from pool
15/06/23 02:04:25 INFO DAGScheduler: Job 3 finished: first at
modelSaveLoad.scala:125, took 0.016193 s
15/06/23 02:04:25 WARN FSInputChecker: Problem opening checksum file:
file:/home/ec2-user/myModel/data/_temporary/0/_temporary/attempt_201506230149_0027_r_01_0/part-r-2.parquet.
Ignoring exception:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:149)
at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at
parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:402)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:298)
at
org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$refresh$6.apply(newParquet.scala:297)
at
scala.collection.parallel.mutable.ParArray$Map.leaf(ParArray.scala:658)
at