Re: rdd count is throwing null pointer exception
Move your count operation outside the foreach and use a broadcast to access it inside the foreach. On Aug 17, 2015 10:34 AM, Priya Ch learnings.chitt...@gmail.com wrote: Looks like because of Spark-5063 RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x = rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063. On Mon, Aug 17, 2015 at 8:13 PM, Preetam preetam...@gmail.com wrote: The error could be because of the missing brackets after the word cache - .ticketRdd.cache() On Aug 17, 2015, at 7:26 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi All, Thank you very much for the detailed explanation. I have scenario like this- I have rdd of ticket records and another rdd of booking records. for each ticket record, i need to check whether any link exists in booking table. val ticketCachedRdd = ticketRdd.cache ticketRdd.foreach{ ticket = val bookingRecords = queryOnBookingTable (date, flightNumber, flightCarrier) // this function queries the booking table and retrieves the booking rows println(ticketCachedRdd.count) // this is throwing Null pointer exception } Is there somthing wrong in the count, i am trying to use the count of cached rdd when looping through the actual rdd. whats wrong in this ? Thanks, Padma Ch
rdd count is throwing null pointer exception
Hi All, Thank you very much for the detailed explanation. I have scenario like this- I have rdd of ticket records and another rdd of booking records. for each ticket record, i need to check whether any link exists in booking table. val ticketCachedRdd = ticketRdd.cache ticketRdd.foreach{ ticket = val bookingRecords = queryOnBookingTable (date, flightNumber, flightCarrier) // this function queries the booking table and retrieves the booking rows println(ticketCachedRdd.count) // this is throwing Null pointer exception } Is there somthing wrong in the count, i am trying to use the count of cached rdd when looping through the actual rdd. whats wrong in this ? Thanks, Padma Ch
Re: rdd count is throwing null pointer exception
The error could be because of the missing brackets after the word cache - .ticketRdd.cache() On Aug 17, 2015, at 7:26 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi All, Thank you very much for the detailed explanation. I have scenario like this- I have rdd of ticket records and another rdd of booking records. for each ticket record, i need to check whether any link exists in booking table. val ticketCachedRdd = ticketRdd.cache ticketRdd.foreach{ ticket = val bookingRecords = queryOnBookingTable (date, flightNumber, flightCarrier) // this function queries the booking table and retrieves the booking rows println(ticketCachedRdd.count) // this is throwing Null pointer exception } Is there somthing wrong in the count, i am trying to use the count of cached rdd when looping through the actual rdd. whats wrong in this ? Thanks, Padma Ch - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: rdd count is throwing null pointer exception
Looks like because of Spark-5063 RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(x = rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063. On Mon, Aug 17, 2015 at 8:13 PM, Preetam preetam...@gmail.com wrote: The error could be because of the missing brackets after the word cache - .ticketRdd.cache() On Aug 17, 2015, at 7:26 AM, Priya Ch learnings.chitt...@gmail.com wrote: Hi All, Thank you very much for the detailed explanation. I have scenario like this- I have rdd of ticket records and another rdd of booking records. for each ticket record, i need to check whether any link exists in booking table. val ticketCachedRdd = ticketRdd.cache ticketRdd.foreach{ ticket = val bookingRecords = queryOnBookingTable (date, flightNumber, flightCarrier) // this function queries the booking table and retrieves the booking rows println(ticketCachedRdd.count) // this is throwing Null pointer exception } Is there somthing wrong in the count, i am trying to use the count of cached rdd when looping through the actual rdd. whats wrong in this ? Thanks, Padma Ch