Re: FP Growth saveAsTextFile

Xiangrui Meng Wed, 20 May 2015 23:14:36 -0700

+user

If this was in cluster mode, you should provide a path on a shared file
system, e.g., HDFS, instead of a local path. If this is in local model, I'm
not sure what went wrong.


On Wed, May 20, 2015 at 2:09 PM, Eric Tanner <eric.tan...@justenough.com>
wrote:

> Here is the stack trace. Thanks for looking at this.
>
> scala>
> model.freqItemsets.saveAsTextFile("c:///repository/trunk/Scala_210_wspace/fpGrowth/modelText1")
> 15/05/20 14:07:47 INFO SparkContext: Starting job: saveAsTextFile at
> <console>:33
> 15/05/20 14:07:47 INFO DAGScheduler: Got job 15 (saveAsTextFile at
> <console>:33) with 2 output partitions (allowLocal=false)
> 15/05/20 14:07:47 INFO DAGScheduler: Final stage: Stage 30(saveAsTextFile
> at <console>:33)
> 15/05/20 14:07:47 INFO DAGScheduler: Parents of final stage: List(Stage 29)
> 15/05/20 14:07:47 INFO DAGScheduler: Missing parents: List()
> 15/05/20 14:07:47 INFO DAGScheduler: Submitting Stage 30
> (MapPartitionsRDD[21] at saveAsTextFile at <console>:33), which has no
> missing parents
> 15/05/20 14:07:47 INFO MemoryStore: ensureFreeSpace(131288) called with
> curMem=724428, maxMem=278302556
> 15/05/20 14:07:47 INFO MemoryStore: Block broadcast_18 stored as values in
> memory (estimated size 128.2 KB, free 264.6 MB)
> 15/05/20 14:07:47 INFO MemoryStore: ensureFreeSpace(78995) called with
> curMem=855716, maxMem=278302556
> 15/05/20 14:07:47 INFO MemoryStore: Block broadcast_18_piece0 stored as
> bytes in memory (estimated size 77.1 KB, free 264.5 MB)
> 15/05/20 14:07:47 INFO BlockManagerInfo: Added broadcast_18_piece0 in
> memory on localhost:52396 (size: 77.1 KB, free: 265.1 MB)
> 15/05/20 14:07:47 INFO BlockManagerMaster: Updated info of block
> broadcast_18_piece0
> 15/05/20 14:07:47 INFO SparkContext: Created broadcast 18 from broadcast
> at DAGScheduler.scala:839
> 15/05/20 14:07:47 INFO DAGScheduler: Submitting 2 missing tasks from Stage
> 30 (MapPartitionsRDD[21] at saveAsTextFile at <console>:33)
> 15/05/20 14:07:47 INFO TaskSchedulerImpl: Adding task set 30.0 with 2 tasks
> 15/05/20 14:07:47 INFO BlockManager: Removing broadcast 17
> 15/05/20 14:07:47 INFO TaskSetManager: Starting task 0.0 in stage 30.0
> (TID 33, localhost, PROCESS_LOCAL, 1056 bytes)
> 15/05/20 14:07:47 INFO BlockManager: Removing block broadcast_17_piece0
> 15/05/20 14:07:47 INFO MemoryStore: Block broadcast_17_piece0 of size 4737
> dropped from memory (free 277372582)
> 15/05/20 14:07:47 INFO TaskSetManager: Starting task 1.0 in stage 30.0
> (TID 34, localhost, PROCESS_LOCAL, 1056 bytes)
> 15/05/20 14:07:47 INFO BlockManagerInfo: Removed broadcast_17_piece0 on
> localhost:52396 in memory (size: 4.6 KB, free: 265.1 MB)
> 15/05/20 14:07:47 INFO Executor: Running task 1.0 in stage 30.0 (TID 34)
> 15/05/20 14:07:47 INFO Executor: Running task 0.0 in stage 30.0 (TID 33)
> 15/05/20 14:07:47 INFO BlockManagerMaster: Updated info of block
> broadcast_17_piece0
> 15/05/20 14:07:47 INFO BlockManager: Removing block broadcast_17
> 15/05/20 14:07:47 INFO MemoryStore: Block broadcast_17 of size 6696
> dropped from memory (free 277379278)
> 15/05/20 14:07:47 INFO ContextCleaner: Cleaned broadcast 17
> 15/05/20 14:07:47 INFO BlockManager: Removing broadcast 16
> 15/05/20 14:07:47 INFO BlockManager: Removing block broadcast_16_piece0
> 15/05/20 14:07:47 INFO MemoryStore: Block broadcast_16_piece0 of size 4737
> dropped from memory (free 277384015)
> 15/05/20 14:07:47 INFO BlockManagerInfo: Removed broadcast_16_piece0 on
> localhost:52396 in memory (size: 4.6 KB, free: 265.1 MB)
> 15/05/20 14:07:47 INFO BlockManagerMaster: Updated info of block
> broadcast_16_piece0
> 15/05/20 14:07:47 INFO BlockManager: Removing block broadcast_16
> 15/05/20 14:07:47 INFO MemoryStore: Block broadcast_16 of size 6696
> dropped from memory (free 277390711)
> 15/05/20 14:07:47 INFO ContextCleaner: Cleaned broadcast 16
> 15/05/20 14:07:47 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty
> blocks out of 2 blocks
> 15/05/20 14:07:47 INFO ShuffleBlockFetcherIterator: Started 0 remote
> fetches in 1 ms
> 15/05/20 14:07:47 INFO ShuffleBlockFetcherIterator: Getting 2 non-empty
> blocks out of 2 blocks
> 15/05/20 14:07:47 INFO ShuffleBlockFetcherIterator: Started 0 remote
> fetches in 0 ms
> 15/05/20 14:07:47 ERROR Executor: Exception in task 1.0 in stage 30.0 (TID
> 34)
> java.lang.NullPointerException
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
>         at org.apache.hadoop.util.Shell.run(Shell.java:455)
>         at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:808)
>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:791)
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:656)
>         at
> org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:490)
>         at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:462)
>         at
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:428)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:908)
>         at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:801)
>         at
> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:123)
>         at
> org.apache.spark.SparkHadoopWriter.open(SparkHadoopWriter.scala:90)
>         at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
>         at
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1059)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>         at org.apache.spark.scheduler.Task.run(Task.scala:64)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
>
> On Wed, May 20, 2015 at 2:05 PM, Xiangrui Meng <men...@gmail.com> wrote:
>
>> Could you post the stack trace? If you are using Spark 1.3 or 1.4, it
>> would be easier to save freq itemsets as a Parquet file. -Xiangrui
>>
>> On Wed, May 20, 2015 at 12:16 PM, Eric Tanner
>> <eric.tan...@justenough.com> wrote:
>> > I am having trouble with saving an FP-Growth model as a text file.  I
>> can
>> > print out the results, but when I try to save the model I get a
>> > NullPointerException.
>> >
>> > model.freqItemsets.saveAsTextFile("c://fpGrowth/model")
>> >
>> > Thanks,
>> >
>> > Eric
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
>
>
>
>
>
> *Eric Tanner*Big Data Developer
>
> [image: JustEnough Logo]
>
> 15440 Laguna Canyon, Suite 100
>
> Irvine, CA 92618
>
>
>
> Cell:
> Tel:
> Skype:
> Web:
>
>   +1 (951) 313-9274
>   +1 (949) 706-0400
>   e <http://tonya.nicholls.je/>ric.tanner.je
>   www.justenough.com
>
> Confidentiality Note: The information contained in this email and
> document(s) attached are for the exclusive use of the addressee and may
> contain confidential, privileged and non-disclosable information. If the
> recipient of this email is not the addressee, such recipient is strictly
> prohibited from reading, photocopying, distribution or otherwise using this
> email or its contents in any way.
>

Re: FP Growth saveAsTextFile

Reply via email to