Move your count operation outside the foreach and use a broadcast to access
it inside the foreach.
On Aug 17, 2015 10:34 AM, "Priya Ch" <learnings.chitt...@gmail.com> wrote:

> Looks like because of Spark-5063
> RDD transformations and actions can only be invoked by the driver, not
> inside of other transformations; for example, rdd1.map(x =>
> rdd2.values.count() * x) is invalid because the values transformation and
> count action cannot be performed inside of the rdd1.map transformation. For
> more information, see SPARK-5063.
>
> On Mon, Aug 17, 2015 at 8:13 PM, Preetam <preetam...@gmail.com> wrote:
>
>> The error could be because of the missing brackets after the word cache -
>> .ticketRdd.cache()
>>
>> > On Aug 17, 2015, at 7:26 AM, Priya Ch <learnings.chitt...@gmail.com>
>> wrote:
>> >
>> > Hi All,
>> >
>> >  Thank you very much for the detailed explanation.
>> >
>> > I have scenario like this-
>> > I have rdd of ticket records and another rdd of booking records. for
>> each ticket record, i need to check whether any link exists in booking
>> table.
>> >
>> > val ticketCachedRdd = ticketRdd.cache
>> >
>> > ticketRdd.foreach{
>> > ticket =>
>> > val bookingRecords =  queryOnBookingTable (date, flightNumber,
>> flightCarrier)  // this function queries the booking table and retrieves
>> the booking rows
>> > println(ticketCachedRdd.count) // this is throwing Null pointer
>> exception
>> >
>> > }
>> >
>> > Is there somthing wrong in the count, i am trying to use the count of
>> cached rdd when looping through the actual rdd. whats wrong in this ?
>> >
>> > Thanks,
>> > Padma Ch
>>
>
>

Reply via email to