How to debug Spark source using IntelliJ/ Eclipse

2015-12-05 Thread jatinganhotra
Hi, I am trying to understand Spark internal code and wanted to debug Spark source, to add a new feature. I have tried the steps lined out here on the Spark Wiki page IDE setup , but they

Checkpointing calls the job twice?

2015-10-17 Thread jatinganhotra
Hi, I noticed that when you checkpoint a given RDD, it results in performing the action twice as I can see 2 jobs being executed in the Spark UI. Example: val logFile = "/data/pagecounts" sc.setCheckpointDir("/checkpoints") val logData = sc.textFile(logFile, 2) val as = logData.filter(line =>

Query about checkpointing time

2015-09-30 Thread jatinganhotra
Hi, I started doing the amp-camp 5 exercises . I tried the following 2 scenarios: *Scenario #1* val pagecounts = sc.textFile("data/pagecounts") pagecounts.checkpoint pagecounts.count *Scenario #2* val pagecounts =